Ong, Hoang
2015.
Semantic attack on transaction data anonymised by set-based generalisation.
PhD Thesis,
Cardiff University.
Item availability restricted. |
Preview |
PDF
- Accepted Post-Print Version
Download (1MB) | Preview |
![]() |
PDF
- Supplemental Material
Restricted to Repository staff only Download (321kB) |
Abstract
Publishing data that contains information about individuals may lead to privacy breaches. However, data publishing is useful to support research and analysis. Therefore, privacy protection in data publishing becomes important and has received much recent attention. To improve privacy protection, many researchers have investigated how secure the published data is by designing de-anonymisation methods to attack anonymised data. Most of the de-anonymisation methods consider anonymised data in a syntactic manner. That is, items in a dataset are considered to be contextless or even meaningless literals, and they have not considered the semantics of these data items. In this thesis, we investigate how secure the anonymised data is under attacks that use semantic information. More specifically, we propose a de-anonymisation method to attack transaction data anonymised by set-based generalisation. Set-based generalisation protects data by replacing one item by a set of items, so that the identity of an individual can be hidden. Our goal is to identify those items that are added to a transaction during generalisation. Our attacking method has two components: scoring and elimination. Scoring measures semantic relationship between items in a transaction, and elimination removes items that are deemed not to be in the original transaction. Our experiments on both real and synthetic data show that set-based generalisation may not provide adequate protection for transaction data, and about 70% of the items added to the transactions during generalisation can be detected by our method with a precision greater than 85%.
Item Type: | Thesis (PhD) |
---|---|
Status: | Unpublished |
Schools: | Computer Science & Informatics |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Uncontrolled Keywords: | privacy, de-anonymisation, set-based generalisation, unstructured data |
Date of First Compliant Deposit: | 30 March 2016 |
Last Modified: | 19 Mar 2016 23:57 |
URI: | https://orca.cardiff.ac.uk/id/eprint/74553 |
Actions (repository staff only)
![]() |
Edit Item |