Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Exploiting Flickr meta-data for predicting environmental features

Jeawak, Shelan 2019. Exploiting Flickr meta-data for predicting environmental features. PhD Thesis, Cardiff University.
Item availability restricted.

[img]
Preview
PDF - Accepted Post-Print Version
Download (5MB) | Preview
[img] PDF (Cardiff University Electronic Publication Form) - Supplemental Material
Restricted to Repository staff only

Download (2MB)

Abstract

The photo-sharing website Flickr has become used as an informal information source in disciplines such as geography and ecology. Many recent studies have highlighted the fact that Flickr tags capture valuable ecological information, which can complement more traditional sources. A shortcoming of most of these existing methods is that they rely on manual interpretation of Flickr content, with little automated exploitation of the associated tags. Therefore, they fail to exploit the full potential of the data. Automatically extracting and analysing information from unstructured and noisy data remains a hard task. This research aims to investigate the use of Flickr meta-data for predicting a wide variety of environmental phenomena. In particular, we consider the problem of predicting scenicness, species distribution, land cover, and climate-related features. To this end, we developed several novel machine learning methods that can efficiently utilise Flickr tags as a supplementary source to the structured information that is available from traditional scientific resources. The first proposed method aims at modelling locations, and hence inferring environmental phenomena, using georeferenced Flickr tags. Our focus was on comparing the predictive power of Flickr tags with that of structured environmental data. This method represents each location as a concatenation of two feature vectors: a bag-of words representation derived from Flickr and a feature vector encoding the numerical and categorical features obtained from the structured dataset. We found that Flickr was generally competitive with the structured environmental data for prediction, being sometimes better and sometimes worse. However, combining Flickr tags with existing ecological data sources consistently improved the results, which suggests that Flickr can indeed be regarded as complementary to traditional sources. The second method that we propose is based on a collective prediction model, which crucially relies on Flickr tags to define the neighbourhood structure. The use of a collective prediction formulation is motivated by the fact that most environmental features are strongly spatially autocorrelated. While this suggests that geographic distance should play a key role in determining neighbourhoods, we show that considerable gains can be made by additionally taking Flickr tags and traditional data into consideration. The thesis considers two further novel methods which are based on a low dimensional vector space representation. The first model, called EGEL (Embedding Geographic Locations), learns vector space embeddings of geographic locations by integrating the textual information derived from Flickr with the numerical and categorical information derived from environmental datasets. We experimentally show that this method improves on bag-of-words representation approaches, especially in cases where structured data are available. This model has been extended by considering a spatiotemporal representation of regions. In particular, we propose a spatiotemporal embeddings model, called SPATE (Spatiotemporal Embeddings), which learns a vector space embedding for each geographic region and each month of the year. This allows the model to capture environmental phenomena that may depend on monthly or seasonal variation. Apart from extending our primary model, SPATE also includes a new smoothing method to deal with the sparsity of Flickr tags over the considered spatiotemporal setup. The experimental results demonstrated in this thesis confirm our hypothesis that there is valuable information contained in Flickr tags which can be used to predict environmental features.

Item Type: Thesis (PhD)
Date Type: Completion
Status: Unpublished
Schools: Computer Science & Informatics
Subjects: Q Science > QA Mathematics > QA76 Computer software
Date of First Compliant Deposit: 14 November 2019
Last Modified: 31 Jul 2020 01:23
URI: http://orca.cardiff.ac.uk/id/eprint/126812

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics