Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Enabling early health care intervention by detecting depression in users of web-based forums using Language models: longitudinal analysis and evaluation

Owen, David ORCID:, Antypas, Dimosthenis, Hassoulas, Athanasios, Pardinas, Antonio ORCID:, Espinosa-Anke, Luis ORCID: and Camacho Collados, Jose ORCID: 2023. Enabling early health care intervention by detecting depression in users of web-based forums using Language models: longitudinal analysis and evaluation. JMIR AI 2 , e41205. 10.2196/41205

[thumbnail of PDF-4.pdf]
PDF - Published Version
Available under License Creative Commons Attribution.

Download (540kB) | Preview
License URL:
License Start date: 24 March 2023


Background: Major Depressive Disorder (MDD) is a common mental disorder that affects 5% of adults worldwide. Early contact with healthcare services is critical in achieving an accurate diagnosis and improving patient outcomes. Key symptoms of MDD (depression hereafter) such as cognitive distortions are observed in verbal communication, which can manifest in the structure of written language as well. Thus, the automatic analysis of text outputs may provide opportunities for early interventions in settings where written communication is rich and regular, such as social media and online forums. Objective: The objective was twofold. We sought to gauge the effectiveness of different machine learning approaches to identifying users of the mass online forum Reddit who eventually disclose a diagnosis of depression. We then aimed to determine whether the time between a forum post and a depression diagnosis date is a relevant factor in performing this detection. Methods: Two Reddit datasets containing posts belonging to users with and without a history of depression diagnosis were obtained. An intersection of these datasets provided users with an estimated date of depression diagnosis. This derived dataset was used as input to several machine learning classifiers, including Transformer-based Language Models. Results: BERT (Mental Bidirectional Encoder Representations from Transformers) and MentalBERT Transformer-based Language Models proved most effective in distinguishing forum users with a known depression diagnosis from those without. They each obtained a mean F1 score of 0.64 across the experimental setups used for binary classification. The results also suggested that the final 12 to 16 weeks (about 3 to 4 months) of posts prior to a depressed user’s estimated diagnosis date are most indicative of their illness, with data prior to that period not helping models detect more accurately. Furthermore, in the four-to-eight-week period prior to the user’s estimated diagnosis date, their posts exhibited more negative sentiment than any other four-week period in their post history. Conclusions: Transformer-based Language Models may be used on data from online social media forums to identify users at risk of psychiatric conditions such as depression. Language features picked up by these classifiers might predate depression onset by weeks to months, enabling proactive mental healthcare interventions to support those at risk of this condition

Item Type: Article
Date Type: Publication
Status: Published
Schools: Medicine
Computer Science & Informatics
MRC Centre for Neuropsychiatric Genetics and Genomics (CNGG)
Publisher: JIMR
ISSN: 2817-1705
Funders: MRC
Date of First Compliant Deposit: 24 January 2023
Date of Acceptance: 15 January 2023
Last Modified: 17 Jun 2023 17:03

Actions (repository staff only)

Edit Item Edit Item


Downloads per month over past year

View more statistics