Edwards, Thomas, Jones, Christopher B. ORCID: https://orcid.org/0000-0001-6847-7575 and Corcoran, Padraig ORCID: https://orcid.org/0000-0001-9731-3385 2022. Identifying wildlife observations on twitter. Ecological Informatics 67 , 101500. 10.1016/j.ecoinf.2021.101500 |
Preview |
PDF
- Accepted Post-Print Version
Download (2MB) | Preview |
Abstract
Despite the potential of social media for environmental monitoring, concerns remain about the quality and reliability of the information automatically extracted. Notably there are many observations of wildlife on Twitter, but their automated detection is a challenge due to the frequent use of wildlife related words in messages that have no connection with wildlife observation. We investigate whether and what type of supervised machine learning methods can be used to create a fully automated text classification model to identify genuine wildlife observations on Twitter, irrespective of species type or whether Tweets are geo-tagged. We perform experiments with various techniques for building feature vectors that serve as input to the classifiers, and consider how they affect classification performance. We compare three classification approaches and perform an analysis of the types of features that are indicative for genuine wildlife observations on Twitter. In particular, we compare some classical machine learning algorithms, widely used in ecology studies, with state-of-the-art neural network models. Results showed that the neural network-based model Bidirectional Encoder Representations from Transformers (BERT) outperformed the classical methods. Notably this was the case for a relatively small training corpus, consisting of less than 3000 instances. This reflects that fact that the BERT classifier uses a transfer learning approach that benefits from prior learning on a very much larger collection of generic text. BERT performed particularly well even for Tweets that employed specialised language relating to wildlife observations. The analysis of possible indicative features for wildlife Tweets revealed interesting trends in the usage of hashtags that are unrelated to official citizen science campaigns. The findings from this study facilitate more accurate identification of wildlife-related data on social media which can in turn be used for enriching citizen science data collections.
Item Type: | Article |
---|---|
Date Type: | Publication |
Status: | Published |
Schools: | Computer Science & Informatics |
Publisher: | Elsevier |
ISSN: | 1574-9541 |
Date of First Compliant Deposit: | 4 March 2022 |
Date of Acceptance: | 24 November 2021 |
Last Modified: | 05 May 2023 02:20 |
URI: | https://orca.cardiff.ac.uk/id/eprint/147972 |
Citation Data
Cited 5 times in Scopus. View in Scopus. Powered By Scopus® Data
Actions (repository staff only)
Edit Item |