Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Addressing uncertainty in identifying pregnancies in the English CPRD GOLD Pregnancy Register: a methodological study using a worked example

Li, Yangmei, Kurinczuk, Jennifer J, Alderdice, Fiona, Quigley, Maria A, Rivero-Arias, Oliver, Sanders, Julia ORCID: https://orcid.org/0000-0001-5712-9989, Kenyon, Sara, Siassakos, Dimitrios, Parekh, Nikesh, De Almeida, Suresha and Carson, Claire 2025. Addressing uncertainty in identifying pregnancies in the English CPRD GOLD Pregnancy Register: a methodological study using a worked example. International Journal of Population Data Science 10 (1) , 08. 10.23889/ijpds.v10i1.2471

[thumbnail of Addressing uncertainty in identifying pregnancies - PUBLISHED.pdf]
Preview
PDF - Published Version
Available under License Creative Commons Attribution.

Download (1MB) | Preview

Abstract

Introduction Electronic health records are invaluable for pregnancy-related studies. The Clinical Practice Research Datalink (CPRD) Pregnancy Register (PR) identifies pregnancies in primary care records, including uncertain cases. Objectives This paper outlines a method to reduce uncertainty in identifying pregnancies within CPRD GOLD PR data, exemplified through a study investigating the provision of pre-pregnancy care. Methods We used CPRD Mother Baby Link (MBL) and Maternity Hospital Episode Statistics (HES) to clean and augment the CPRD PR data. The study included all women aged 18-48yrs, registered at an English GP practice within CPRD on 01/01/2017, with a year of prior registration and eligibility for hospital data linkage. We developed a cleaning and combining algorithm and further applied strict data quality criteria to form three populations: 'as provided', 'derived' (using our algorithm) and 'strictly derived' (with stricter data quality criteria). We compared characteristics and outcomes across these populations, examining potential biases in effect estimates using the 'as provided' population. Results Our algorithm added 22,270 (~7%) pregnancies from hospital data to the CPRD PR (1997-2021), eliminated conflicting pregnancies and pregnancies with unknown outcomes, and minimised potentially non-contemporaneous records of past pregnancies or partial records of pregnancies. For all pregnancies across women's reproductive history, in the `strictly derived' population, characterised by better data quality, a higher prevalence of pre-existing medical conditions and increased pre-pregnancy care were observed. In this dataset, recording of both exposure and outcome was better, and the magnitude of the association between exposure and outcome was reduced compared to the `as provided' population. Conclusion PR data requires cleaning before use. This study presents a pragmatic and practical method to identify pregnancies using existing CPRD data and linked records, without needing additional data. Researchers should carefully consider their studies' specific requirements and may adapt our proposed methodology accordingly to align with their research questions.

Item Type: Article
Date Type: Publication
Status: Published
Schools: Schools > Healthcare Sciences
Publisher: Swansea University
ISSN: 2399-4908
Date of First Compliant Deposit: 14 March 2025
Date of Acceptance: 7 January 2025
Last Modified: 17 Mar 2025 11:16
URI: https://orca.cardiff.ac.uk/id/eprint/176873

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics