Liu, Han ORCID: https://orcid.org/0000-0002-7731-8258, Burnap, Pete ORCID: https://orcid.org/0000-0003-0396-633X, Alorainy, Wafa and Williams, Matthew L. ORCID: https://orcid.org/0000-0003-2566-6063 2019. A fuzzy approach to text classification with two-stage training for ambiguous instances. IEEE Transactions on Computational Social Systems 6 (2) , pp. 227-240. 10.1109/TCSS.2019.2892037 |
Preview |
PDF
- Accepted Post-Print Version
Download (1MB) | Preview |
Abstract
Sentiment analysis is a very popular application area of text mining and machine learning. The popular methods include Support Vector Machine, Naive Bayes, Decision Trees and Deep Neural Networks. However, these methods generally belong to discriminative learning, which aims to distinguish one class from others with a clear-cut outcome, under the presence of ground truth. In the context of text classification, instances are naturally fuzzy (can be multi-labeled in some application areas) and thus are not considered clear-cut, especially given the fact that labels assigned to sentiment in text represent an agreed level of subjective opinion for multiple human annotators rather than indisputable ground truth. This has motivated researchers to develop fuzzy methods, which typically train classifiers through generative learning, i.e. a fuzzy classifier is used to measure the degree to which an instance belongs to each class. Traditional fuzzy methods typically involve generation of a single fuzzy classifier and employ a fixed rule of defuzzification outputting the class with the maximum membership degree. The use of a single fuzzy classifier with the above fixed rule of defuzzification is likely to get the classifier encountering the text ambiguity situation on sentiment data, i.e. an instance may obtain equal membership degrees to both the positive and negative classes. In this paper, we focus on cyberhate classification, since the spread of hate speech via social media can have disruptive impacts on social cohesion and lead to regional and community tensions. Automatic detection of cyberhate has thus become a priority research area. In particular, we propose a modified fuzzy approach with two stage training for dealing with text ambiguity and classifying four types of hate speech, namely: religion, race, disability and sexual orientation - and compare its performance with those popular methods as well as some existing fuzzy approaches, while the features are prepared through the Bag-of-Words and Word Embedding feature extraction methods alongside the correlation based feature subset selection method. The experimental results show that the proposed fuzzy method outperforms the other methods in most cases.
Item Type: | Article |
---|---|
Date Type: | Publication |
Status: | Published |
Schools: | Computer Science & Informatics Social Sciences (Includes Criminology and Education) |
Publisher: | Institute of Electrical and Electronics Engineers (IEEE) |
ISSN: | 2329-924X |
Funders: | Economic and Social Research Council |
Date of First Compliant Deposit: | 16 January 2019 |
Date of Acceptance: | 1 January 2019 |
Last Modified: | 12 Nov 2024 23:45 |
URI: | https://orca.cardiff.ac.uk/id/eprint/118413 |
Citation Data
Cited 32 times in Scopus. View in Scopus. Powered By Scopus® Data
Actions (repository staff only)
Edit Item |