ORCA
Online Research @ Cardiff

Clear Cookie - decide language by browser settings

Adapting historical clinical genetic test records for anonymised data linkage: obstacles and opportunities

Maddison, Robert, Reed, Karen R.

, Cannings-John, Rebecca

, Lugg-Widger, Fiona

, Stoneman, Thomas, Anderson, Sarah and Fry, Andrew E.

2025. Adapting historical clinical genetic test records for anonymised data linkage: obstacles and opportunities. International Journal of Population Data Science 8 (5) , pp. 1-6. 10.23889/ijpds.v8i5.2924

Preview

PDF - Published Version
Available under License Creative Commons Attribution.
Download (330kB) | Preview

Official URL: http://dx.doi.org/10.23889/ijpds.v8i5.2924

Abstract

Introduction Cystic fibrosis (CF) heterozygotes (also known as `carriers') are people who have one mutated copy of the CFTR gene. Research into the health risks of CF carriers has been limited by a lack of large cohorts tested for CF carrier status, but routine clinical testing identifies CF carriers in the population. Such test records additionally contain large amounts of clinical information, making them a valuable research resource to not only identify CF carriers in the population but also to provide additional data not found elsewhere. Methods Following governance approvals, we adapted 30 years worth of CF genetic testing records generated by the All-Wales Medical Genomics Service (AWMGS) and submitted them to the SAIL Databank for anonymised linkage. Results Unexpected obstacles meant that a minimum amount of clinical information could be annotated ahead of linkage. The raw data were highly heterogeneous due to the records' longitudinal collection and clinical origins, making standardisation difficult. Moreover, the presence of unique identifiers in the clinical data violated the separation principle, requiring manual annotation to produce a cleaned dataset. Explicit identification of patients or their relatives throughout the records complicated split file anonymisation. Conclusion Extracting useful information from historical clinical genetic test records is a significant challenge with technical and governance aspects. The mixing of unique identifiers with clinical data in heterogeneous, unstructured free text combined with a lack of automated tools meant that manual annotation was required to adhere to the separation principle. As such, only a minimum of the available clinical data was annotatable within the project timeline and mutually exclusive access to the identifiable and pseudonymised data meant that annotations could not later be validated. Future efforts to link clinical genetic test records for research must consider these challenges in their approach.

Item Type:	Article
Date Type:	Publication
Status:	Published
Schools:	Schools > Medicine
Publisher:	Swansea University
ISSN:	2399-4908
Date of First Compliant Deposit:	11 March 2025
Date of Acceptance:	10 December 2024
Last Modified:	14 Mar 2025 15:17
URI:	https://orca.cardiff.ac.uk/id/eprint/176808

Actions (repository staff only)

Edit Item

Dimensions

Altmetric

Download Statistics

Downloads

Downloads per month over past year

View more statistics

CORE (COnnecting REpositories)