Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

A hybrid approach for geo-referencing tweets: transformer language model regression and gazetteer disambiguation

Edwards, Thomas, Corcoran, Padraig ORCID: https://orcid.org/0000-0001-9731-3385 and Jones, Christopher B. ORCID: https://orcid.org/0000-0001-6847-7575 2025. A hybrid approach for geo-referencing tweets: transformer language model regression and gazetteer disambiguation. ISPRS International Journal of Geo-Information 14 (9) , 321. 10.3390/ijgi14090321

[thumbnail of ijgi-14-00321-v2.pdf] PDF - Published Version
Available under License Creative Commons Attribution.

Download (2MB)
License URL: https://creativecommons.org/licenses/by/4.0/
License Start date: 22 August 2025

Abstract

Recent approaches to geo-referencing X posts have focused on the use of language modelling techniques that learn geographic region-specific language and use this to infer geographic coordinates from text. These approaches rely on large amounts of labelled data to build accurate predictive models. However, obtaining significant volumes of geo-referenced data from Twitter, recently renamed X, can be difficult. Further, existing language modelling approaches can require the division of a given area into a grid or set of clusters, which can be dataset-specific and challenging for location prediction at a fine-grained level. Regression-based approaches in combination with deep learning address some of these challenges as they can assign coordinates directly without the need for clustering or grid-based methods. However, such approaches have received only limited attention for the geo-referencing task. In this paper, we adapt state-of-the-art neural network models for the regression task, focusing on geo-referencing wildlife Tweets where there is a limited amount of data. We experiment with different transfer learning techniques for improving the performance of the regression models, and we also compare our approach to recently developed Large Language Models and prompting techniques. We show that using a location names extraction method in combination with regression-based disambiguation, and purely regression when names are absent, leads to significant improvements in locational accuracy over using only regression.

Item Type: Article
Date Type: Publication
Status: Published
Schools: Schools > Computer Science & Informatics
Additional Information: License information from Publisher: LICENSE 1: URL: https://creativecommons.org/licenses/by/4.0/, Start Date: 2025-08-22
Publisher: MDPI
Date of First Compliant Deposit: 4 September 2025
Date of Acceptance: 15 August 2025
Last Modified: 04 Sep 2025 09:45
URI: https://orca.cardiff.ac.uk/id/eprint/180887

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics