Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

A fuzzy approach to text classification with two-stage training for ambiguous instances

Liu, Han ORCID: https://orcid.org/0000-0002-7731-8258, Burnap, Pete ORCID: https://orcid.org/0000-0003-0396-633X, Alorainy, Wafa and Williams, Matthew L. ORCID: https://orcid.org/0000-0003-2566-6063 2019. A fuzzy approach to text classification with two-stage training for ambiguous instances. IEEE Transactions on Computational Social Systems 6 (2) , pp. 227-240. 10.1109/TCSS.2019.2892037

[thumbnail of IEEE_TCSS__Final_Version_.pdf]
Preview
PDF - Accepted Post-Print Version
Download (1MB) | Preview

Abstract

Sentiment analysis is a very popular application area of text mining and machine learning. The popular methods include Support Vector Machine, Naive Bayes, Decision Trees and Deep Neural Networks. However, these methods generally belong to discriminative learning, which aims to distinguish one class from others with a clear-cut outcome, under the presence of ground truth. In the context of text classification, instances are naturally fuzzy (can be multi-labeled in some application areas) and thus are not considered clear-cut, especially given the fact that labels assigned to sentiment in text represent an agreed level of subjective opinion for multiple human annotators rather than indisputable ground truth. This has motivated researchers to develop fuzzy methods, which typically train classifiers through generative learning, i.e. a fuzzy classifier is used to measure the degree to which an instance belongs to each class. Traditional fuzzy methods typically involve generation of a single fuzzy classifier and employ a fixed rule of defuzzification outputting the class with the maximum membership degree. The use of a single fuzzy classifier with the above fixed rule of defuzzification is likely to get the classifier encountering the text ambiguity situation on sentiment data, i.e. an instance may obtain equal membership degrees to both the positive and negative classes. In this paper, we focus on cyberhate classification, since the spread of hate speech via social media can have disruptive impacts on social cohesion and lead to regional and community tensions. Automatic detection of cyberhate has thus become a priority research area. In particular, we propose a modified fuzzy approach with two stage training for dealing with text ambiguity and classifying four types of hate speech, namely: religion, race, disability and sexual orientation - and compare its performance with those popular methods as well as some existing fuzzy approaches, while the features are prepared through the Bag-of-Words and Word Embedding feature extraction methods alongside the correlation based feature subset selection method. The experimental results show that the proposed fuzzy method outperforms the other methods in most cases.

Item Type: Article
Date Type: Publication
Status: Published
Schools: Computer Science & Informatics
Social Sciences (Includes Criminology and Education)
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
ISSN: 2329-924X
Funders: Economic and Social Research Council
Date of First Compliant Deposit: 16 January 2019
Date of Acceptance: 1 January 2019
Last Modified: 12 Nov 2024 23:45
URI: https://orca.cardiff.ac.uk/id/eprint/118413

Citation Data

Cited 32 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics