ORCA
Online Research @ Cardiff

Clear Cookie - decide language by browser settings

Identifying wildlife observations on twitter

Edwards, Thomas, Jones, Christopher B.

and Corcoran, Padraig

2022. Identifying wildlife observations on twitter. Ecological Informatics 67 , 101500. 10.1016/j.ecoinf.2021.101500

[thumbnail of EdwardsJonesCorcoran-2021-Identifying WildlifeObservationsOnTwitter.pdf]

Preview

PDF - Accepted Post-Print Version
Download (2MB) | Preview

Official URL: http://dx.doi.org/10.1016/j.ecoinf.2021.101500

Abstract

Despite the potential of social media for environmental monitoring, concerns remain about the quality and reliability of the information automatically extracted. Notably there are many observations of wildlife on Twitter, but their automated detection is a challenge due to the frequent use of wildlife related words in messages that have no connection with wildlife observation. We investigate whether and what type of supervised machine learning methods can be used to create a fully automated text classification model to identify genuine wildlife observations on Twitter, irrespective of species type or whether Tweets are geo-tagged. We perform experiments with various techniques for building feature vectors that serve as input to the classifiers, and consider how they affect classification performance. We compare three classification approaches and perform an analysis of the types of features that are indicative for genuine wildlife observations on Twitter. In particular, we compare some classical machine learning algorithms, widely used in ecology studies, with state-of-the-art neural network models. Results showed that the neural network-based model Bidirectional Encoder Representations from Transformers (BERT) outperformed the classical methods. Notably this was the case for a relatively small training corpus, consisting of less than 3000 instances. This reflects that fact that the BERT classifier uses a transfer learning approach that benefits from prior learning on a very much larger collection of generic text. BERT performed particularly well even for Tweets that employed specialised language relating to wildlife observations. The analysis of possible indicative features for wildlife Tweets revealed interesting trends in the usage of hashtags that are unrelated to official citizen science campaigns. The findings from this study facilitate more accurate identification of wildlife-related data on social media which can in turn be used for enriching citizen science data collections.

Item Type:	Article
Date Type:	Publication
Status:	Published
Schools:	Computer Science & Informatics
Publisher:	Elsevier
ISSN:	1574-9541
Date of First Compliant Deposit:	4 March 2022
Date of Acceptance:	24 November 2021
Last Modified:	05 May 2023 02:20
URI:	https://orca.cardiff.ac.uk/id/eprint/147972

Citation Data

Cited 5 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item

Altmetric

Dimensions

Download Statistics

Downloads

Downloads per month over past year

View more statistics

CORE (COnnecting REpositories)