ORCA
Online Research @ Cardiff

Clear Cookie - decide language by browser settings

A hybrid approach for geo-referencing tweets: transformer language model regression and gazetteer disambiguation

Edwards, Thomas, Corcoran, Padraig

and Jones, Christopher B.

2025. A hybrid approach for geo-referencing tweets: transformer language model regression and gazetteer disambiguation. ISPRS International Journal of Geo-Information 14 (9) , 321. 10.3390/ijgi14090321

PDF - Published Version
Available under License Creative Commons Attribution.
Download (2MB)

License URL: https://creativecommons.org/licenses/by/4.0/

License Start date: 22 August 2025

Official URL: https://doi.org/10.3390/ijgi14090321

Abstract

Recent approaches to geo-referencing X posts have focused on the use of language modelling techniques that learn geographic region-specific language and use this to infer geographic coordinates from text. These approaches rely on large amounts of labelled data to build accurate predictive models. However, obtaining significant volumes of geo-referenced data from Twitter, recently renamed X, can be difficult. Further, existing language modelling approaches can require the division of a given area into a grid or set of clusters, which can be dataset-specific and challenging for location prediction at a fine-grained level. Regression-based approaches in combination with deep learning address some of these challenges as they can assign coordinates directly without the need for clustering or grid-based methods. However, such approaches have received only limited attention for the geo-referencing task. In this paper, we adapt state-of-the-art neural network models for the regression task, focusing on geo-referencing wildlife Tweets where there is a limited amount of data. We experiment with different transfer learning techniques for improving the performance of the regression models, and we also compare our approach to recently developed Large Language Models and prompting techniques. We show that using a location names extraction method in combination with regression-based disambiguation, and purely regression when names are absent, leads to significant improvements in locational accuracy over using only regression.

Item Type:	Article
Date Type:	Publication
Status:	Published
Schools:	Schools > Computer Science & Informatics
Additional Information:	License information from Publisher: LICENSE 1: URL: https://creativecommons.org/licenses/by/4.0/, Start Date: 2025-08-22
Publisher:	MDPI
Date of First Compliant Deposit:	4 September 2025
Date of Acceptance:	15 August 2025
Last Modified:	04 Sep 2025 09:45
URI:	https://orca.cardiff.ac.uk/id/eprint/180887

Actions (repository staff only)

Edit Item

Altmetric

Dimensions

Download Statistics

Downloads

Downloads per month over past year

View more statistics

CORE (COnnecting REpositories)