Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

High-confidence labelling of pathology reports using LLM-based unanimous ensembles with limited data

Greatrix, Thomas, Langbein, Frank C. ORCID: https://orcid.org/0000-0002-3379-0323, Whitaker, Roger M. ORCID: https://orcid.org/0000-0002-8473-1913, Colombo, Gualtiero B. and Turner, Liam ORCID: https://orcid.org/0000-0003-4877-5289 2025. High-confidence labelling of pathology reports using LLM-based unanimous ensembles with limited data. Presented at: International Conference of AI in Healthcare, Cambridge, 8-10 September 2025. Proceedings of the Artificial Intelligence in Healthcare. Lecture Notes in Computer Science Springer, pp. 381-395. 10.1007/978-3-032-00652-3_27

[thumbnail of AI_in_healthcare_camera_ready (2).pdf]
Preview
PDF - Accepted Post-Print Version
Download (1MB) | Preview

Abstract

Manual labelling of pathology reports is a costly bottleneck for medical data analysis. We propose diverse unanimous ensembles, integrating Large Language Models (LLMs) like GPT-4o with complementary model architectures, for high-confidence automatic labelling of pathology reports, particularly addressing the challenge of labelled training data scarcity. This consensus method yields high precision on an automatically identifiable subset while simultaneously flagging ambiguous cases requiring expert review. Applying this to the public TCGA-Reports dataset, a GPT-4o and DistilBERT ensemble achieved 95.5% accuracy on the 45.5% subset representing a 23.1 percentage point increase over the baseline DistilBERT’s overall accuracy on the full dataset. This demonstrates potential for cost-effective data annotation by automatically labelling high-confidence subsets, thereby reserving human effort for ambiguous cases.

Item Type: Conference or Workshop Item (Paper)
Date Type: Published Online
Status: Published
Schools: Schools > Computer Science & Informatics
Publisher: Springer
ISBN: 9783032006516
Date of First Compliant Deposit: 14 June 2025
Last Modified: 28 Aug 2025 11:28
URI: https://orca.cardiff.ac.uk/id/eprint/179079

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics