Li, Yangmei, Kurinczuk, Jennifer J, Alderdice, Fiona, Quigley, Maria A, Rivero-Arias, Oliver, Sanders, Julia ![]() ![]() |
Preview |
PDF
- Published Version
Available under License Creative Commons Attribution. Download (1MB) | Preview |
Abstract
Introduction Electronic health records are invaluable for pregnancy-related studies. The Clinical Practice Research Datalink (CPRD) Pregnancy Register (PR) identifies pregnancies in primary care records, including uncertain cases. Objectives This paper outlines a method to reduce uncertainty in identifying pregnancies within CPRD GOLD PR data, exemplified through a study investigating the provision of pre-pregnancy care. Methods We used CPRD Mother Baby Link (MBL) and Maternity Hospital Episode Statistics (HES) to clean and augment the CPRD PR data. The study included all women aged 18-48yrs, registered at an English GP practice within CPRD on 01/01/2017, with a year of prior registration and eligibility for hospital data linkage. We developed a cleaning and combining algorithm and further applied strict data quality criteria to form three populations: 'as provided', 'derived' (using our algorithm) and 'strictly derived' (with stricter data quality criteria). We compared characteristics and outcomes across these populations, examining potential biases in effect estimates using the 'as provided' population. Results Our algorithm added 22,270 (~7%) pregnancies from hospital data to the CPRD PR (1997-2021), eliminated conflicting pregnancies and pregnancies with unknown outcomes, and minimised potentially non-contemporaneous records of past pregnancies or partial records of pregnancies. For all pregnancies across women's reproductive history, in the `strictly derived' population, characterised by better data quality, a higher prevalence of pre-existing medical conditions and increased pre-pregnancy care were observed. In this dataset, recording of both exposure and outcome was better, and the magnitude of the association between exposure and outcome was reduced compared to the `as provided' population. Conclusion PR data requires cleaning before use. This study presents a pragmatic and practical method to identify pregnancies using existing CPRD data and linked records, without needing additional data. Researchers should carefully consider their studies' specific requirements and may adapt our proposed methodology accordingly to align with their research questions.
Item Type: | Article |
---|---|
Date Type: | Publication |
Status: | Published |
Schools: | Schools > Healthcare Sciences |
Publisher: | Swansea University |
ISSN: | 2399-4908 |
Date of First Compliant Deposit: | 14 March 2025 |
Date of Acceptance: | 7 January 2025 |
Last Modified: | 17 Mar 2025 11:16 |
URI: | https://orca.cardiff.ac.uk/id/eprint/176873 |
Actions (repository staff only)
![]() |
Edit Item |