Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Evaluating double descent in machine learning: insights from tree-based models applied to a genomic prediction task

Cimadevila, Guillermo Comesaña 2025. Evaluating double descent in machine learning: insights from tree-based models applied to a genomic prediction task. [Online]. arXiv: Cornell University. Available at: https://doi.org/10.48550/arXiv.2509.25216

[thumbnail of 2509.25216v1.pdf]
Preview
PDF - Submitted Pre-Print Version
Download (977kB) | Preview

Abstract

Classical learning theory describes a well-characterised U-shaped relationship between model complexity and prediction error, reflecting a transition from underfitting in underparameterised regimes to overfitting as complexity grows. Recent work, however, has introduced the notion of a second descent in test error beyond the interpolation threshold-giving rise to the so-called double descent phenomenon. While double descent has been studied extensively in the context of deep learning, it has also been reported in simpler models, including decision trees and gradient boosting. In this work, we revisit these claims through the lens of classical machine learning applied to a biological classification task: predicting isoniazid resistance in Mycobacterium tuberculosis using whole-genome sequencing data. We systematically vary model complexity along two orthogonal axes-learner capacity (e.g., Pleaf, Pboost) and ensemble size (i.e., Pens)-and show that double descent consistently emerges only when complexity is scaled jointly across these axes. When either axis is held fixed, generalisation behaviour reverts to classical U- or L-shaped patterns. These results are replicated on a synthetic benchmark and support the unfolding hypothesis, which attributes double descent to the projection of distinct generalisation regimes onto a single complexity axis. Our findings underscore the importance of treating model complexity as a multidimensional construct when analysing generalisation behaviour.

Item Type: Website Content
Date Type: Published Online
Status: Submitted
Schools: Schools > Medicine
Research Institutes & Centres > MRC Centre for Neuropsychiatric Genetics and Genomics (CNGG)
Publisher: Cornell University
Funders: University of Bath
Related URLs:
Date of First Compliant Deposit: 10 October 2025
Date of Acceptance: 22 September 2025
Last Modified: 14 Oct 2025 11:45
URI: https://orca.cardiff.ac.uk/id/eprint/181601

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics