Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Review of synthetic data terminology for privacy preserving use cases

Frayling, Lora, Suarj Bharat, Shah, Pattinson, Elizabeth, Stock, Joshua, Lugg-Widger, Fiona ORCID: https://orcid.org/0000-0003-0029-9703, Gordon, Emma and Oliver, Emily 2025. Review of synthetic data terminology for privacy preserving use cases. International Journal of Population Data Science 10 (2) , 08. 10.23889/ijpds.v10i2.2967

[thumbnail of ijpds-10-2967.pdf] PDF - Published Version
Available under License Creative Commons Attribution.

Download (635kB)

Abstract

Synthetic data is emerging as a key area of development for supporting research that involves secure forms of administrative and health data, both in the United Kingdom and globally. In practice, key challenges in the generation and adoption of synthetic data are closely tied to the need for agreed and consistent terminology for describing it. The absence of standardised language hinders the setting of quality standards, establishment of governance and guidelines and effective sharing of knowledge and best practices. This has implications for research that uses synthetic healthcare and administrative data, particularly when such data are generated from protected personal data. This commentary paper reviews existing literature on synthetic data to explore how key terms are currently defined in practice, with a focus on privacy-preserving use cases. Our analysis reveals that terms describing properties of synthetic data are often lacking and inconsistent, largely due to the breadth of synthetic data types, contexts and use cases. Context-specific terminology with nuanced meanings complicates efforts for the development of universally agreed definitions, particularly for privacy-preserving synthetic data that captures characteristics from protected data sources. To address this, we propose broad definitions for key terms including synthetic data, utility, utility measure and fidelity. We conclude by offering a set of recommendations emphasising the need for consensus on terminology and encouraging clearer descriptions in future literature that specify both the intended use of the data and the measures used to describe it.

Item Type: Article
Date Type: Publication
Status: Published
Schools: Schools > Medicine
Research Institutes & Centres > Centre for Trials Research (CNTRR)
Publisher: Swansea University
ISSN: 2399-4908
Date of First Compliant Deposit: 27 October 2025
Date of Acceptance: 27 June 2025
Last Modified: 27 Oct 2025 12:45
URI: https://orca.cardiff.ac.uk/id/eprint/181910

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics