Statistical disclosure control for data privacy using sequence of generalised linear models

Lee, Min Cherng, Mitra, Robin

, Lazaridis, Emmanuel, Lai, An Chow, Goh, Yong Kheng and Yap, Wun-She 2016. Statistical disclosure control for data privacy using sequence of generalised linear models. Presented at: 21st Australasian Conference on Information Security and Privacy (ACISP 2016), Melbourne, VIC, Australia, 4-6 July 2016. Published in: Liu, Joseph K. and Steinfeld, Ron eds. Information Security and Privacy: 21st Australasian Conference, ACISP 2016, Melbourne, VIC, Australia, July 4-6, 2016, Proceedings, Part I. Lecture Notes in Computer Science. 0302-9743 , vol.9722 Springer Verlag, pp. 77-93. 10.1007/978-3-319-40253-6_5

Full text not available from this repository.

Official URL: http://dx.doi.org/10.1007/978-3-319-40253-6_5

Abstract

When releasing data for public use, statistical agencies seek to reduce the risk of disclosure, while preserving the utility of the release data. Common approaches such as adding random noises, top coding variables and swapping data values will distort the relationships in the original data. To achieve the aforementioned properties, we consider the synthetic data approach in this paper where we release multiply imputed partially synthetic data sets comprising original data values, and with values at high disclosure risk being replaced by synthetic values. To generate such synthetic data, we introduce a new variant of factored regression model proposed by Lee and Mitra in 2016. In addition, we take a step forward to propose a new algorithm in identifying the original data that need to be replaced with synthetic data. By using our proposed methods, data privacy can be preserved since it is difficult to identify the individual under the scenario that the released synthetic data are not entirely similar with the original data. Besides, valid inference about the data can be made using simple combining rules, which take the uncertainty due to the presence of synthetic values. To evaluate the performance of our proposed methods in term of the risk of disclosure and the utility of the released synthetic data, we conduct an experiment on a dataset taken from 1987 National Indonesia Contraceptive Prevalence.

Item Type:	Conference or Workshop Item - published (Paper)
Date Type:	Published Online
Status:	Published
Schools:	Schools > Mathematics
Publisher:	Springer Verlag
ISSN:	0302-9743
Last Modified:	06 May 2023 02:00
URI:	https://orca.cardiff.ac.uk/id/eprint/141153

Citation Data

Cited 1 time in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item

Dimensions

Altmetric

CORE (COnnecting REpositories)