Van Laere, Olivier, Schockaert, Steven ORCID: https://orcid.org/0000-0002-9256-2881, Tanasescu, Vlad, Dhoedt, Bart and Jones, Christopher B. ORCID: https://orcid.org/0000-0001-6847-7575 2014. Georeferencing Wikipedia documents using data from social media sources. ACM Transactions on Information Systems 32 (3) , 12. 10.1145/2629685 |
Abstract
Social media sources such as Flickr and Twitter continuously generate large amounts of textual information (tags on Flickr and short messages on Twitter). This textual information is increasingly linked to geographical coordinates, which makes it possible to learn how people refer to places by identifying correlations between the occurrence of terms and the locations of the corresponding social media objects. Recent work has focused on how this potentially rich source of geographic information can be used to estimate geographic coordinates for previously unseen Flickr photos or Twitter messages. In this article, we extend this work by analysing to what extent probabilistic language models trained on Flickr and Twitter can be used to assign coordinates to Wikipedia articles. Our results show that exploiting these language models substantially outperforms both (i) classical gazetteer-based methods (in particular, using Yahoo! Placemaker and Geonames) and (ii) language modelling approaches trained on Wikipedia alone. This supports the hypothesis that social media are important sources of geographic information, which are valuable beyond the scope of individual applications.
Item Type: | Article |
---|---|
Date Type: | Publication |
Status: | Published |
Schools: | Computer Science & Informatics |
Publisher: | Association for Computing Machinery (ACM) |
ISSN: | 1046-8188 |
Last Modified: | 27 Oct 2022 09:30 |
URI: | https://orca.cardiff.ac.uk/id/eprint/66165 |
Citation Data
Cited 16 times in Scopus. View in Scopus. Powered By Scopus® Data
Actions (repository staff only)
Edit Item |