Challenges in multi-task learning for fMRI-based diagnosis: Benefits for psychiatric conditions and CNVs would likely require thousands of patients

Harvey, Annabelle, Moreau, Clara A., Kumar, Kuldeep, Huguet, Guillaume, Urchs, Sebastian G.W., Sharmarke, Hanad, Jizi, Khadije, Martin, Charles-Olivier, Younis, Nadine, Tamer, Petra, Martineau, Jean-Louis, Orban, Pierre, Silva, Ana Isabel

, Hall, Jeremy, van den Bree, Marianne B.M., Owen, Michael J., Linden, David E.J., Lippé, Sarah, Bearden, Carrie E., Dumas, Guillaume, Jacquemont, Sébastien and Bellec, Pierre 2024. Challenges in multi-task learning for fMRI-based diagnosis: Benefits for psychiatric conditions and CNVs would likely require thousands of patients. Imaging Neuroscience 2 , imag-2-00222. 10.1162/imag_a_00222

PDF - Published Version
Available under License Creative Commons Attribution.
Download (3MB)

License URL: https://creativecommons.org/licenses/by/4.0/

License Start date: 25 June 2024

Official URL: https://doi.org/10.1162/imag_a_00222

Abstract

There is a growing interest in using machine learning (ML) models to perform automatic diagnosis of psychiatric conditions; however, generalising the prediction of ML models to completely independent data can lead to sharp decrease in performance. Patients with different psychiatric diagnoses have traditionally been studied independently, yet there is a growing recognition of neuroimaging signatures shared across them as well as rare genetic copy number variants (CNVs). In this work, we assess the potential of multi-task learning (MTL) to improve accuracy by characterising multiple related conditions with a single model, making use of information shared across diagnostic categories and exposing the model to a larger and more diverse dataset. As a proof of concept, we first established the efficacy of MTL in a context where there is clearly information shared across tasks: the same target (age or sex) is predicted at different sites of data collection in a large functional magnetic resonance imaging (fMRI) dataset compiled from multiple studies. MTL generally led to substantial gains relative to independent prediction at each site. Performing scaling experiments on the UK Biobank, we observed that performance was highly dependent on sample size: for large sample sizes (N > 6000) sex prediction was better using MTL across three sites (N = K per site) than prediction at a single site (N = 3K), but for small samples (N < 500) MTL was actually detrimental for age prediction. We then used established machine-learning methods to benchmark the diagnostic accuracy of each of the 7 CNVs (N = 19–103) and 4 psychiatric conditions (N = 44–472) independently, replicating the accuracy previously reported in the literature on psychiatric conditions. We observed that MTL hurt performance when applied across the full set of diagnoses, and complementary analyses failed to identify pairs of conditions which would benefit from MTL. Taken together, our results show that if a successful multi-task diagnostic model of psychiatric conditions were to be developed with resting-state fMRI, it would likely require datasets with thousands of patients across different diagnoses.

Item Type:	Article
Date Type:	Publication
Status:	Published
Schools:	Schools > Medicine
Additional Information:	License information from Publisher: LICENSE 1: URL: https://creativecommons.org/licenses/by/4.0/, Start Date: 2024-06-25
Publisher:	Massachusetts Institute of Technology Press
Date of First Compliant Deposit:	27 August 2025
Date of Acceptance:	10 June 2024
Last Modified:	27 Aug 2025 13:30
URI:	https://orca.cardiff.ac.uk/id/eprint/180691

Actions (repository staff only)

Edit Item

Dimensions

Altmetric

Download Statistics

Downloads

Downloads per month over past year

View more statistics

CORE (COnnecting REpositories)