Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Semantic attack on transaction data anonymised by set-based generalisation

Ong, Hoang 2015. Semantic attack on transaction data anonymised by set-based generalisation. PhD Thesis, Cardiff University.
Item availability restricted.

[thumbnail of 2015onghphd.pdf]
PDF - Accepted Post-Print Version
Download (1MB) | Preview
[thumbnail of ongh.pdf] PDF - Supplemental Material
Restricted to Repository staff only

Download (321kB)


Publishing data that contains information about individuals may lead to privacy breaches. However, data publishing is useful to support research and analysis. Therefore, privacy protection in data publishing becomes important and has received much recent attention. To improve privacy protection, many researchers have investigated how secure the published data is by designing de-anonymisation methods to attack anonymised data. Most of the de-anonymisation methods consider anonymised data in a syntactic manner. That is, items in a dataset are considered to be contextless or even meaningless literals, and they have not considered the semantics of these data items. In this thesis, we investigate how secure the anonymised data is under attacks that use semantic information. More specifically, we propose a de-anonymisation method to attack transaction data anonymised by set-based generalisation. Set-based generalisation protects data by replacing one item by a set of items, so that the identity of an individual can be hidden. Our goal is to identify those items that are added to a transaction during generalisation. Our attacking method has two components: scoring and elimination. Scoring measures semantic relationship between items in a transaction, and elimination removes items that are deemed not to be in the original transaction. Our experiments on both real and synthetic data show that set-based generalisation may not provide adequate protection for transaction data, and about 70% of the items added to the transactions during generalisation can be detected by our method with a precision greater than 85%.

Item Type: Thesis (PhD)
Status: Unpublished
Schools: Computer Science & Informatics
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Uncontrolled Keywords: privacy, de-anonymisation, set-based generalisation, unstructured data
Date of First Compliant Deposit: 30 March 2016
Last Modified: 19 Mar 2016 23:57

Actions (repository staff only)

Edit Item Edit Item


Downloads per month over past year

View more statistics