Enabling early health care intervention by detecting depression in users of web-based forums using Language models: longitudinal analysis and evaluation

Owen, David

, Antypas, Dimosthenis, Hassoulas, Athanasios

, Pardinas, Antonio

, Espinosa-Anke, Luis

and Camacho Collados, Jose

2023. Enabling early health care intervention by detecting depression in users of web-based forums using Language models: longitudinal analysis and evaluation. JMIR AI 2 , e41205. 10.2196/41205

Preview

PDF - Published Version
Available under License Creative Commons Attribution.
Download (540kB) | Preview

License URL: http://creativecommons.org/licenses/by/4.0/

License Start date: 24 March 2023

Official URL: https://doi.org/10.2196/41205

Abstract

Background: Major Depressive Disorder (MDD) is a common mental disorder that affects 5% of adults worldwide. Early contact with healthcare services is critical in achieving an accurate diagnosis and improving patient outcomes. Key symptoms of MDD (depression hereafter) such as cognitive distortions are observed in verbal communication, which can manifest in the structure of written language as well. Thus, the automatic analysis of text outputs may provide opportunities for early interventions in settings where written communication is rich and regular, such as social media and online forums. Objective: The objective was twofold. We sought to gauge the effectiveness of different machine learning approaches to identifying users of the mass online forum Reddit who eventually disclose a diagnosis of depression. We then aimed to determine whether the time between a forum post and a depression diagnosis date is a relevant factor in performing this detection. Methods: Two Reddit datasets containing posts belonging to users with and without a history of depression diagnosis were obtained. An intersection of these datasets provided users with an estimated date of depression diagnosis. This derived dataset was used as input to several machine learning classifiers, including Transformer-based Language Models. Results: BERT (Mental Bidirectional Encoder Representations from Transformers) and MentalBERT Transformer-based Language Models proved most effective in distinguishing forum users with a known depression diagnosis from those without. They each obtained a mean F1 score of 0.64 across the experimental setups used for binary classification. The results also suggested that the final 12 to 16 weeks (about 3 to 4 months) of posts prior to a depressed user’s estimated diagnosis date are most indicative of their illness, with data prior to that period not helping models detect more accurately. Furthermore, in the four-to-eight-week period prior to the user’s estimated diagnosis date, their posts exhibited more negative sentiment than any other four-week period in their post history. Conclusions: Transformer-based Language Models may be used on data from online social media forums to identify users at risk of psychiatric conditions such as depression. Language features picked up by these classifiers might predate depression onset by weeks to months, enabling proactive mental healthcare interventions to support those at risk of this condition

Item Type:	Article
Date Type:	Publication
Status:	Published
Schools:	Professional Services > Advanced Research Computing @ Cardiff (ARCCA) Research Institutes & Centres > MRC Centre for Neuropsychiatric Genetics and Genomics (CNGG) Schools > Computer Science & Informatics Schools > Medicine
Publisher:	JIMR
ISSN:	2817-1705
Funders:	MRC
Date of First Compliant Deposit:	24 January 2023
Date of Acceptance:	15 January 2023
Last Modified:	27 Jun 2025 15:32
URI:	https://orca.cardiff.ac.uk/id/eprint/156203

Actions (repository staff only)

Edit Item

Dimensions

Altmetric

Download Statistics

Downloads

Downloads per month over past year

View more statistics

CORE (COnnecting REpositories)