Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Suspended accounts: A source of Tweets with disgust and anger emotions for augmenting hate speech data sample

Alorainy, Wafa, Burnap, Pete ORCID: https://orcid.org/0000-0003-0396-633X, Liu, Han ORCID: https://orcid.org/0000-0002-7731-8258, Javed, Amir ORCID: https://orcid.org/0000-0001-9761-0945 and Williams, Matthew ORCID: https://orcid.org/0000-0003-2566-6063 2018. Suspended accounts: A source of Tweets with disgust and anger emotions for augmenting hate speech data sample. Presented at: International Conference on Machine Learning and Cybernetics, Chengdu, China, 15-18 July 2018. 2018 International Conference on Machine Learning and Cybernetics (ICMLC). IEEE, pp. 581-586. 10.1109/ICMLC.2018.8527001

[thumbnail of ICMLC 2018 Paper 4043.pdf]
Preview
PDF - Accepted Post-Print Version
Download (161kB) | Preview

Abstract

In this paper we present a proposal to address the problem of the pricey and unreliable human annotation, which is important for detection of hate speech from the web contents. In particular, we propose to use the text that are produced from the suspended accounts in the aftermath of a hateful event as subtle and reliable source for hate speech prediction. The proposal was motivated after implementing emotion analysis on three sources of data sets: suspended, active and neutral ones, i.e. the first two sources of data sets contain hateful tweets from suspended accounts and active accounts, respectively, whereas the third source of data sets contain neutral tweets only. The emotion analysis indicated that the tweets from suspended accounts show more disgust, negative, fear and sadness emotions than the ones from active accounts, although tweets from both types of accounts might be annotated as hateful ones by human annotators. We train two Random Forest classifiers based on the semantic meaning of tweets respectively from suspended and active accounts, and evaluate the prediction accuracy of the two classifiers on unseen data. The results show that the classifier trained on the tweets from suspended accounts outperformed the one trained on the tweets from active accounts by 16% of overall F-score.

Item Type: Conference or Workshop Item (Paper)
Date Type: Publication
Status: Published
Schools: Social Sciences (Includes Criminology and Education)
Computer Science & Informatics
Publisher: IEEE
ISBN: 9781538652145
Date of First Compliant Deposit: 3 July 2018
Date of Acceptance: 17 May 2018
Last Modified: 05 Jan 2024 06:17
URI: https://orca.cardiff.ac.uk/id/eprint/112920

Citation Data

Cited 14 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics