Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Real-time text classification of user-generated content on social media: Systematic review

Rogers, David ORCID: https://orcid.org/0000-0001-8198-5961, Preece, Alun ORCID: https://orcid.org/0000-0003-0349-9057, Innes, Martin ORCID: https://orcid.org/0000-0002-8950-8147 and Spasic, Irena ORCID: https://orcid.org/0000-0002-8132-3885 2022. Real-time text classification of user-generated content on social media: Systematic review. IEEE Transactions on Computational Social Systems 9 (4) , pp. 1154-1166. 10.1109/TCSS.2021.3120138

[thumbnail of David Rogers_For Orca FINAL - Real_Time_Text_Mining_of_Social_Media.pdf]
Preview
PDF - Accepted Post-Print Version
Download (217kB) | Preview

Abstract

The aim of this systematic review is to determine the current state of the art in the real-time classification of user-generated content from social media. Focus is on the identification of the main characteristics of data used for training and testing, the types of text processing and normalization that are required, the machine learning methods used most commonly, and how these methods compare to one another in terms of classification performance. Relevant studies were selected from subscription-based digital libraries, free-to-access bibliographies, and self-curated repositories and then screened for relevance with key information extracted and structured against the following facets: natural language processing (NLP) methods, data characteristics, classification methods, and evaluation results. A total of 25 studies published between 2014 and 2018 covering 15 types of classification algorithms were included in this review. Support vector machines (SVMs), Bayesian classifiers, and decision trees were the most commonly employed algorithms with recent emergence of neural network approaches. Domain-specific, application programming interface (API)-driven collection is the most prevalent origin of datasets. The reuse of previously published datasets as a means of benchmarking algorithms against other studies is also prevalent. In conclusion, there are consistent approaches taken when normalizing social media data for text mining and traditional text mining techniques are suited to the task of real-time analysis of social media.

Item Type: Article
Date Type: Publication
Status: Published
Schools: Social Sciences (Includes Criminology and Education)
Universities' Police Science Institute (UPSI)
Computer Science & Informatics
Data Innovation Research Institute (DIURI)
Crime and Security Research Institute (CSURI)
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Publisher: Institute of Electrical and Electronics Engineers
ISSN: 2329-924X
Date of First Compliant Deposit: 22 October 2021
Date of Acceptance: 30 September 2021
Last Modified: 07 Nov 2023 00:50
URI: https://orca.cardiff.ac.uk/id/eprint/143583

Citation Data

Cited 15 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics