Liao, Yuxiang, Xiang, Haishan, Liu, Hantao ORCID: https://orcid.org/0000-0003-4544-3481 and Spasić, Irena ORCID: https://orcid.org/0000-0002-8132-3885 2024. Using information extraction to normalize the training data for automatic radiology report generation. IEEE Access 10.1109/ACCESS.2024.3504378 |
PDF
- Accepted Post-Print Version
Available under License Creative Commons Attribution. Download (2MB) |
Abstract
High lexico-syntactic variation across radiology reports even when they convey the same diagnostic information complicates evaluation and hence the training of deep learning models for Automatic Radiology Report Generation. This problem can be addressed by (1) developing an internal standard for the structured representation of radiology reports, (2) automatically converting radiology reports to the structured representation prior to training, (3) training a deep learning model to generate a structured radiology report from an image, and finally (4) converting the structured report into a narrative one. In this study, we focus specifically on steps (1) and (2). First, we proposed a structured radiology report scheme based upon RadGraph, which serves to formally represent clinical entities, their attributes and relations discussed in a radiology report. Using the new scheme, we manually annotated a total of 550 MIMIC-CXR reports for model training and evaluation and 50 CheXpert reports for evaluating the model’s generalization ability. We developed a joint entity and relation model and proposed a novel auxiliary component to enhance the model performance by interpreting token-level information. Using the annotated data, we trained the model for automatically converting information from a narrative radiology report into the structured representation, which achieved a micro-F1 of 96.6% and 96.1% on named entity recognition, 94.0% and 89.8% on entity attribute recognition, and 89.5% and 86.6% on relation extraction, on the MIMIC-CXR and CheXpert test sets, respectively. We then used this model to automatically annotate 227,835 MIMIC-CXR reports. We shared all data and software deliverables using PhysioNet Credentialed Health Data License 1.5.0 to enable further research on Automatic Radiology Report Generation.
Item Type: | Article |
---|---|
Date Type: | Published Online |
Status: | In Press |
Schools: | Computer Science & Informatics |
Subjects: | Q Science > QA Mathematics > QA76 Computer software |
Publisher: | Institute of Electrical and Electronics Engineers |
ISSN: | 2169-3536 |
Date of First Compliant Deposit: | 2 December 2024 |
Date of Acceptance: | 19 November 2024 |
Last Modified: | 03 Dec 2024 10:30 |
URI: | https://orca.cardiff.ac.uk/id/eprint/174225 |
Actions (repository staff only)
Edit Item |