Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Georeferencing Wikipedia documents using data from social media sources

Van Laere, Olivier, Schockaert, Steven ORCID:, Tanasescu, Vlad, Dhoedt, Bart and Jones, Christopher B. ORCID: 2014. Georeferencing Wikipedia documents using data from social media sources. ACM Transactions on Information Systems 32 (3) , 12. 10.1145/2629685

Full text not available from this repository.


Social media sources such as Flickr and Twitter continuously generate large amounts of textual information (tags on Flickr and short messages on Twitter). This textual information is increasingly linked to geographical coordinates, which makes it possible to learn how people refer to places by identifying correlations between the occurrence of terms and the locations of the corresponding social media objects. Recent work has focused on how this potentially rich source of geographic information can be used to estimate geographic coordinates for previously unseen Flickr photos or Twitter messages. In this article, we extend this work by analysing to what extent probabilistic language models trained on Flickr and Twitter can be used to assign coordinates to Wikipedia articles. Our results show that exploiting these language models substantially outperforms both (i) classical gazetteer-based methods (in particular, using Yahoo! Placemaker and Geonames) and (ii) language modelling approaches trained on Wikipedia alone. This supports the hypothesis that social media are important sources of geographic information, which are valuable beyond the scope of individual applications.

Item Type: Article
Date Type: Publication
Status: Published
Schools: Computer Science & Informatics
Publisher: Association for Computing Machinery (ACM)
ISSN: 1046-8188
Last Modified: 27 Oct 2022 09:30

Citation Data

Cited 16 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item Edit Item