Rogers, David ORCID: https://orcid.org/0000-0001-8198-5961, Preece, Alun ORCID: https://orcid.org/0000-0003-0349-9057, Innes, Martin ORCID: https://orcid.org/0000-0002-8950-8147 and Spasic, Irena ORCID: https://orcid.org/0000-0002-8132-3885 2022. Real-time text classification of user-generated content on social media: Systematic review. IEEE Transactions on Computational Social Systems 9 (4) , pp. 1154-1166. 10.1109/TCSS.2021.3120138 |
Preview |
PDF
- Accepted Post-Print Version
Download (217kB) | Preview |
Abstract
The aim of this systematic review is to determine the current state of the art in the real-time classification of user-generated content from social media. Focus is on the identification of the main characteristics of data used for training and testing, the types of text processing and normalization that are required, the machine learning methods used most commonly, and how these methods compare to one another in terms of classification performance. Relevant studies were selected from subscription-based digital libraries, free-to-access bibliographies, and self-curated repositories and then screened for relevance with key information extracted and structured against the following facets: natural language processing (NLP) methods, data characteristics, classification methods, and evaluation results. A total of 25 studies published between 2014 and 2018 covering 15 types of classification algorithms were included in this review. Support vector machines (SVMs), Bayesian classifiers, and decision trees were the most commonly employed algorithms with recent emergence of neural network approaches. Domain-specific, application programming interface (API)-driven collection is the most prevalent origin of datasets. The reuse of previously published datasets as a means of benchmarking algorithms against other studies is also prevalent. In conclusion, there are consistent approaches taken when normalizing social media data for text mining and traditional text mining techniques are suited to the task of real-time analysis of social media.
Item Type: | Article |
---|---|
Date Type: | Publication |
Status: | Published |
Schools: | Social Sciences (Includes Criminology and Education) Universities' Police Science Institute (UPSI) Computer Science & Informatics Data Innovation Research Institute (DIURI) Crime and Security Research Institute (CSURI) |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Publisher: | Institute of Electrical and Electronics Engineers |
ISSN: | 2329-924X |
Date of First Compliant Deposit: | 22 October 2021 |
Date of Acceptance: | 30 September 2021 |
Last Modified: | 07 Nov 2023 00:50 |
URI: | https://orca.cardiff.ac.uk/id/eprint/143583 |
Citation Data
Cited 15 times in Scopus. View in Scopus. Powered By Scopus® Data
Actions (repository staff only)
Edit Item |