Maddison, Robert, Reed, Karen R. ![]() ![]() ![]() ![]() ![]() |
Preview |
PDF
- Published Version
Available under License Creative Commons Attribution. Download (330kB) | Preview |
Abstract
Introduction Cystic fibrosis (CF) heterozygotes (also known as `carriers') are people who have one mutated copy of the CFTR gene. Research into the health risks of CF carriers has been limited by a lack of large cohorts tested for CF carrier status, but routine clinical testing identifies CF carriers in the population. Such test records additionally contain large amounts of clinical information, making them a valuable research resource to not only identify CF carriers in the population but also to provide additional data not found elsewhere. Methods Following governance approvals, we adapted 30 years worth of CF genetic testing records generated by the All-Wales Medical Genomics Service (AWMGS) and submitted them to the SAIL Databank for anonymised linkage. Results Unexpected obstacles meant that a minimum amount of clinical information could be annotated ahead of linkage. The raw data were highly heterogeneous due to the records' longitudinal collection and clinical origins, making standardisation difficult. Moreover, the presence of unique identifiers in the clinical data violated the separation principle, requiring manual annotation to produce a cleaned dataset. Explicit identification of patients or their relatives throughout the records complicated split file anonymisation. Conclusion Extracting useful information from historical clinical genetic test records is a significant challenge with technical and governance aspects. The mixing of unique identifiers with clinical data in heterogeneous, unstructured free text combined with a lack of automated tools meant that manual annotation was required to adhere to the separation principle. As such, only a minimum of the available clinical data was annotatable within the project timeline and mutually exclusive access to the identifiable and pseudonymised data meant that annotations could not later be validated. Future efforts to link clinical genetic test records for research must consider these challenges in their approach.
Item Type: | Article |
---|---|
Date Type: | Publication |
Status: | Published |
Schools: | Schools > Medicine |
Publisher: | Swansea University |
ISSN: | 2399-4908 |
Date of First Compliant Deposit: | 11 March 2025 |
Date of Acceptance: | 10 December 2024 |
Last Modified: | 14 Mar 2025 15:17 |
URI: | https://orca.cardiff.ac.uk/id/eprint/176808 |
Actions (repository staff only)
![]() |
Edit Item |