Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

High-confidence labelling of pathology reports using LLM-based unanimous ensembles with limited data

Greatrix, Thomas, Langbein, Frank C. ORCID: https://orcid.org/0000-0002-3379-0323, Whitaker, Roger M. ORCID: https://orcid.org/0000-0002-8473-1913, Colombo, Gualtiero B. and Turner, Liam ORCID: https://orcid.org/0000-0003-4877-5289 2025. High-confidence labelling of pathology reports using LLM-based unanimous ensembles with limited data. Presented at: International Conference of AI in Healthcare, Cambridge, 8-10 September 2025.
Item availability restricted.

[thumbnail of AI_in_healthcare_camera_ready (2).pdf] PDF - Accepted Post-Print Version
Restricted to Repository staff only until 11 September 2026 due to copyright restrictions.

Download (1MB) | Request a copy
[thumbnail of Provisional File This article is currently in press.pdf] PDF - Accepted Post-Print Version
Download (17kB)
Official URL: https://aiih.cc

Abstract

Manual labelling of pathology reports is a costly bottleneck for medical data analysis. We propose diverse unanimous ensembles, integrating Large Language Models (LLMs) like GPT-4o with complementary model architectures, for high-confidence automatic labelling of pathology reports, particularly addressing the challenge of labelled training data scarcity. This consensus method yields high precision on an automatically identifiable subset while simultaneously flagging ambiguous cases requiring expert review. Applying this to the public TCGA-Reports dataset, a GPT-4o and DistilBERT ensemble achieved 95.5% accuracy on the 45.5% subset representing a 23.1 percentage point increase over the baseline DistilBERT’s overall accuracy on the full dataset. This demonstrates potential for cost-effective data annotation by automatically labelling high-confidence subsets, thereby reserving human effort for ambiguous cases.

Item Type: Conference or Workshop Item (Paper)
Status: In Press
Schools: Schools > Computer Science & Informatics
Date of First Compliant Deposit: 14 June 2025
Last Modified: 04 Jul 2025 08:45
URI: https://orca.cardiff.ac.uk/id/eprint/179079

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics