Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Multi-class machine classification of suicide-related communication on Twitter

Burnap, Pete ORCID:, Colombo, Gualtiero, Amery, Rosie, Hodorog, Andrei ORCID: and Scourfield, Jonathan ORCID: 2017. Multi-class machine classification of suicide-related communication on Twitter. Online Social Networks and Media 2 , pp. 32-44. 10.1016/j.osnem.2017.08.001

[thumbnail of 1-s2.0-S2468696417300605-main.pdf]
PDF - Published Version
Available under License Creative Commons Attribution.

Download (1MB) | Preview


The World Wide Web, and online social networks in particular, have increased connectivity between people such that information can spread to millions of people in a matter of minutes. This form of online collective contagion has provided many benefits to society, such as providing reassurance and emergency management in the immediate aftermath of natural disasters. However, it also poses a potential risk to vulnerable Web users who receive this information and could subsequently come to harm. One example of this would be the spread of suicidal ideation in online social networks, about which concerns have been raised. In this paper we report the results of a number of machine classifiers built with the aim of classifying text relating to suicide on Twitter. The classifier distinguishes between the more worrying content, such as suicidal ideation, and other suicide-related topics such as reporting of a suicide, memorial, campaigning and support. It also aims to identify flippant references to suicide. We built a set of baseline classifiers using lexical, structural, emotive and psychological features extracted from Twitter posts. We then improved on the baseline classifiers by building an ensemble classifier using the Rotation Forest algorithm and a Maximum Probability voting classification decision method, based on the outcome of base classifiers. This achieved an F-measure of 0.728 overall (for 7 classes, including suicidal ideation) and 0.69 for the suicidal ideation class. We summarise the results by reflecting on the most significant predictive principle components of the suicidal ideation class to provide insight into the language used on Twitter to express suicidal ideation. Finally, we perform a 12-month case study of suicide-related posts where we further evaluate the classification approach - showing a sustained classification performance and providing anonymous insights into the trends and demographic profile of Twitter users posting content of this type.

Item Type: Article
Date Type: Publication
Status: Published
Schools: Computer Science & Informatics
Social Sciences (Includes Criminology and Education)
Data Innovation Research Institute (DIURI)
Subjects: H Social Sciences > H Social Sciences (General)
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Publisher: Elsevier
ISSN: 2468-6964
Funders: Department of Health
Date of First Compliant Deposit: 5 September 2017
Date of Acceptance: 8 August 2017
Last Modified: 19 Feb 2024 06:26

Citation Data

Cited 95 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item Edit Item


Downloads per month over past year

View more statistics