Chopard, Daphne, Treder, Matthias ORCID: https://orcid.org/0000-0001-5955-2326, Corcoran, Padraig ORCID: https://orcid.org/0000-0001-9731-3385, Johnson, Claire, Busse-Morris, Monica ORCID: https://orcid.org/0000-0002-5331-5909 and Spasic, Irena ORCID: https://orcid.org/0000-0002-8132-3885 2021. Text mining of adverse events in clinical trials: Deep learning approach. JMIR Medical Informatics 9 (12) , e28632. 10.2196/28632 |
Preview |
PDF
- Published Version
Available under License Creative Commons Attribution. Download (1MB) | Preview |
Abstract
Background: Pharmacovigilance and safety reporting, which involves processes for monitoring the use of medicines in clinical trials, plays a critical role in the identification of previously unrecognized adverse events or changes in the patterns of adverse events. Objective: This study aimed to demonstrate feasibility of automating the coding of adverse events described in the narrative section of the serious adverse event report forms to enable a statistical analysis of the aforementioned patterns. Methods: We used the Unified Medical Language System (UMLS) as the coding scheme, which integrates 217 source vocabularies, thus enabling coding against other relevant terminologies such as ICD-10, MedDRA and SNOMED. We used MetaMap, highly configurable dictionary lookup software, to identify mentions of the UMLS concepts. We trained a binary classifier using Bidirectional Encoder Representations from Transformer (BERT), a transformer-based language model that captures contextual relationships, to differentiate between mentions of the UMLS concepts that represent adverse events and those that do not. Results: The model achieved a high F1 score of 0.8080 despite the class imbalance. This is 10.15 percent points lower than human-like performance, but also 17.45 percent points higher than the baseline approach. Conclusions: These results confirmed that automated coding of adverse events described in the narrative section of the serious adverse event reports is feasible. Once coded, adverse events can be statistically analyzed so that any correlations with the trialed medicines can be estimated in a timely fashion. Keywords: natural language processing; deep learning; machine learning; classification
Item Type: | Article |
---|---|
Date Type: | Publication |
Status: | Published |
Schools: | Computer Science & Informatics Centre for Trials Research (CNTRR) Data Innovation Research Institute (DIURI) |
Subjects: | Q Science > QA Mathematics > QA76 Computer software |
Publisher: | JMIR Publications |
ISSN: | 2291-9694 |
Funders: | EPSRC |
Date of First Compliant Deposit: | 18 January 2022 |
Date of Acceptance: | 14 November 2021 |
Last Modified: | 19 May 2023 01:18 |
URI: | https://orca.cardiff.ac.uk/id/eprint/145494 |
Actions (repository staff only)
Edit Item |