Burnap, Peter ORCID: https://orcid.org/0000-0003-0396-633X and Williams, Matthew Leighton ORCID: https://orcid.org/0000-0003-2566-6063 2014. Hate speech, machine classification and statistical modelling of information flows on Twitter: interpretation and communication for policy decision making. Presented at: Internet, Policy & Politics, Oxford, UK, 26 September 2014. |
Preview |
PDF
- Accepted Post-Print Version
Download (388kB) | Preview |
Abstract
In 2013, the murder of Drummer Lee Rigby in Woolwich, UK led to an extensive public social media reaction. Given the extreme terrorist motive and public nature of the actions it was feasible that the public response could include written expressions of hateful and antagonistic sentiment towards a particular race, ethnicity and religion, which can be interpreted as ‘hate speech’. This provided motivation to study the spread of hate speech on Twitter following such a widespread and emotive event. In this paper we present a supervised machine learning text classifier, trained and tested to distinguish between hateful and/or antagonistic responses with a focus on race, ethnicity or religion; and more general responses. We used human annotated data collected from Twitter in the immediate aftermath of Lee Rigby’s murder to train and test the classifier. As “Big Data” is a growing topic of study, and its use is in policy and decision making is being constantly debated at present, we discuss the use of supervised machine learning tools to classify a sample of “Big Data”, and how the results can be interpreted for use in policy and decision making. The results of the classifier are optimal using a combination of probabilistic, rule-based and spatial based classifiers with a voted ensemble meta-classifier. We achieve an overall F-measure of 0.95 using features derived from the content of each tweet, including syntactic dependencies between terms to recognise “othering” terms, incitement to respond with antagonistic action, and claims of well founded or justified discrimination against social groups. We then demonstrate how the results of the classifier can be robustly utilized in a statistical model used to forecast the likely spread of hate speech in a sample of Twitter data.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Date Type: | Completion |
Status: | Published |
Schools: | Computer Science & Informatics Social Sciences (Includes Criminology and Education) |
Date of First Compliant Deposit: | 30 March 2016 |
Last Modified: | 19 Nov 2024 16:30 |
URI: | https://orca.cardiff.ac.uk/id/eprint/65227 |
Citation Data
Actions (repository staff only)
Edit Item |