Owen, David ORCID: https://orcid.org/0000-0002-4028-0591, Antypas, Dimosthenis, Hassoulas, Athanasios, Pardinas, Antonio ORCID: https://orcid.org/0000-0001-6845-7590, Espinosa-Anke, Luis ORCID: https://orcid.org/0000-0001-6830-9176 and Camacho Collados, Jose ORCID: https://orcid.org/0000-0003-1618-7239 2023. Enabling early health care intervention by detecting depression in users of web-based forums using Language models: longitudinal analysis and evaluation. JMIR AI 2 , e41205. 10.2196/41205 |
Preview |
PDF
- Published Version
Available under License Creative Commons Attribution. Download (540kB) | Preview |
Abstract
Background: Major Depressive Disorder (MDD) is a common mental disorder that affects 5% of adults worldwide. Early contact with healthcare services is critical in achieving an accurate diagnosis and improving patient outcomes. Key symptoms of MDD (depression hereafter) such as cognitive distortions are observed in verbal communication, which can manifest in the structure of written language as well. Thus, the automatic analysis of text outputs may provide opportunities for early interventions in settings where written communication is rich and regular, such as social media and online forums. Objective: The objective was twofold. We sought to gauge the effectiveness of different machine learning approaches to identifying users of the mass online forum Reddit who eventually disclose a diagnosis of depression. We then aimed to determine whether the time between a forum post and a depression diagnosis date is a relevant factor in performing this detection. Methods: Two Reddit datasets containing posts belonging to users with and without a history of depression diagnosis were obtained. An intersection of these datasets provided users with an estimated date of depression diagnosis. This derived dataset was used as input to several machine learning classifiers, including Transformer-based Language Models. Results: BERT (Mental Bidirectional Encoder Representations from Transformers) and MentalBERT Transformer-based Language Models proved most effective in distinguishing forum users with a known depression diagnosis from those without. They each obtained a mean F1 score of 0.64 across the experimental setups used for binary classification. The results also suggested that the final 12 to 16 weeks (about 3 to 4 months) of posts prior to a depressed user’s estimated diagnosis date are most indicative of their illness, with data prior to that period not helping models detect more accurately. Furthermore, in the four-to-eight-week period prior to the user’s estimated diagnosis date, their posts exhibited more negative sentiment than any other four-week period in their post history. Conclusions: Transformer-based Language Models may be used on data from online social media forums to identify users at risk of psychiatric conditions such as depression. Language features picked up by these classifiers might predate depression onset by weeks to months, enabling proactive mental healthcare interventions to support those at risk of this condition
Item Type: | Article |
---|---|
Date Type: | Publication |
Status: | Published |
Schools: | Medicine Computer Science & Informatics MRC Centre for Neuropsychiatric Genetics and Genomics (CNGG) |
Publisher: | JIMR |
ISSN: | 2817-1705 |
Funders: | MRC |
Date of First Compliant Deposit: | 24 January 2023 |
Date of Acceptance: | 15 January 2023 |
Last Modified: | 17 Jun 2023 17:03 |
URI: | https://orca.cardiff.ac.uk/id/eprint/156203 |
Actions (repository staff only)
Edit Item |