Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

The BioWhere Project: unlocking the potential of biological collections data

Stock, Kristin, Wijegunarathna, Kalana, Jones, Christopher B. ORCID: https://orcid.org/0000-0001-6847-7575, Morris, Hone, Das, Pragyan, Medyckyj-Scott, David and Whitehead, Brandon 2023. The BioWhere Project: unlocking the potential of biological collections data. GI_Forum 11 (1) , pp. 3-21. 10.1553/giscience2023_01_s3

[thumbnail of Stock-2023-The BioWhere Project- Unlocking the Potential of Biological Collections Data.pdf]
Preview
PDF - Published Version
Available under License Creative Commons Attribution.

Download (947kB) | Preview

Abstract

Vast numbers of biological specimens (e.g. flora, fauna, soils) are stored in collections globally. Many of these have only a natural-language location description, such as ‘200ft above and south of main highway, 1.1 miles west of Porters Pass’, and numerical coordinates are unknown. The BioWhere project is pioneering methods to automatically determine the geographic coordinates (georeferences) of complex location descriptions. Particular challenges are posed by the variable accuracy of recent and historical data that might be used to train models to predict geographic coordinates from the natural-language descriptions; by the presence of historical place names in the descriptions that are not stored in existing gazetteers; and by the vague and context-sensitive nature (e.g. above, on, south of) of the descriptions. We are addressing these challenges by extending the latest transformer-based deep learning models to parse locality descriptions, and to build models for specific spatial terms that incorporate geographic context and data quality to more accurately predict georeferences. We also describe a gazetteer that contains enriched cultural content to support georeferencing of historical records, and to serve as a store of New Zealand Māori cultural knowledge for future generations.

Item Type: Article
Date Type: Published Online
Status: Published
Schools: Computer Science & Informatics
Publisher: Verlag der Österreichischen Akademie der Wissenschaften
ISSN: 2308-1708
Date of First Compliant Deposit: 12 June 2024
Date of Acceptance: 15 March 2023
Last Modified: 03 Sep 2024 08:47
URI: https://orca.cardiff.ac.uk/id/eprint/169779

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics