Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Data privacy preserving scheme using generalised linear models

Lee, Min Cherng, Mitra, Robin ORCID: https://orcid.org/0000-0001-9584-8044, Lazaridis, Emmanuel, Lai, An-Chow, Goh, Yong Kheng and Yap, Wun-She 2017. Data privacy preserving scheme using generalised linear models. Computers and Security 69 , pp. 142-154. 10.1016/j.cose.2016.12.009

Full text not available from this repository.

Abstract

When releasing data for public use, statistical agencies seek to reduce the risk of disclosure, while preserving the utility of the release data. Commonly used approaches (such as adding random noises, top coding variables and swapping data values) will distort the relationships in the original data. To preserve the utility and reduce the risk of disclosure for the released data, we consider the synthetic data approach in this paper where we release multiply imputed partially synthetic data sets comprising original data values, and with values at high disclosure risk being replaced by synthetic values. To generate such synthetic data, we introduce a new variant of factored regression model proposed by Lee and Mitra in 2016. In addition, we take a step forward to propose a new algorithm in identifying the original data that need to be replaced with synthetic data. More importantly, the algorithm that can identify the original data with high disclosure risk can be applied on other existing statistical disclosure control schemes. By using our proposed scheme, data privacy can be preserved since it is difficult to identify the individual under the scenario that the released synthetic data are not entirely similar with the original data. Besides, valid inference about the data can be made using simple combining rules, which take the uncertainty due to the presence of synthetic values. To evaluate the performance of our proposed scheme in terms of the risk of disclosure and the utility of the released synthetic data, we conduct an experiment on a data set taken from 1987 National Indonesia Contraceptive Prevalence. The results justify the applicability of our proposed data privacy preserving scheme in reducing the risk of disclosure while preserving the utility of the released data.

Item Type: Article
Date Type: Publication
Status: Published
Schools: Mathematics
Publisher: Elsevier
ISSN: 0167-4048
Last Modified: 06 May 2023 02:00
URI: https://orca.cardiff.ac.uk/id/eprint/141152

Citation Data

Cited 4 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item Edit Item