ORCA
Online Research @ Cardiff

Clear Cookie - decide language by browser settings

Optimal string sanitization against strategic attackers

Bian, Pengxin, Theodorakopoulos, Georgios

, Pissis, Solon P. and Loukides, Grigorios 2025. Optimal string sanitization against strategic attackers. IEEE Transactions on Information Forensics and Security 10.1109/TIFS.2025.3610245

Preview	PDF - Accepted Post-Print Version Download (2MB) \| Preview
Preview	PDF (Supplementary Material) - Supplemental Material Download (1MB) \| Preview

Official URL: https://doi.org/10.1109/TIFS.2025.3610245

Abstract

Strings (sequences of elements) are often disseminated to support applications, e.g., in bioinformatics, web analysis, and transportation. Unfortunately, this may expose sensitive patterns that model confidential knowledge. Concealing the occurrences of sensitive patterns in a string (e.g. by deleting some elements) while minimizing the associated quality loss has been the objective of several string sanitization methods. However, real-world attackers are likely to possess background knowledge about the string, e.g., an individual's genome sequence is almost identical to a reference genome sequence. In addition, it is good security practice to assume that the attacker will know the algorithm that has been used to sanitize the string (Kerckhoffs’ principle). Yet, all existing methods fail to protect strings against such attackers, risking privacy breaches in critical applications. In our work, we consider for the first time how to defend against strategic attackers who possess such knowledge. To achieve this, we propose a novel framework to sanitize a string by probabilistically replacing carefully selected patterns. As part of this framework, we design three mathematical programming algorithms which compute the optimal replacement probabilities under different objectives and constraints, offering different privacy gain / quality loss tradeoffs. Our algorithms protect against strategic attackers using new concepts and measures, protect sensitive patterns of any length, and can construct one or more optimally sanitized strings that can be used in applications such as frequent pattern mining. Our experiments using five real-world datasets from different domains show that all our algorithms are substantially more effective than a natural baseline (e.g., they offer up to 2 times more privacy when they are configured to incur the same quality loss, and up to 3 times lower quality loss when they are configured to offer the same privacy). They also show that two "hybrid" algorithms that we propose, based on combining elements of the above algorithms, inherit the advantages of their constituent algorithms. These results, coupled with the generality of our approach, make our algorithms practical and beneficial for deployment.

Item Type:	Article
Date Type:	Published Online
Status:	In Press
Schools:	Schools > Computer Science & Informatics
Subjects:	Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Publisher:	Institute of Electrical and Electronics Engineers
ISSN:	1556-6013
Date of First Compliant Deposit:	12 September 2025
Date of Acceptance:	30 August 2025
Last Modified:	22 Sep 2025 13:00
URI:	https://orca.cardiff.ac.uk/id/eprint/181065

Actions (repository staff only)

Edit Item

Altmetric

Dimensions

Download Statistics

Downloads

Downloads per month over past year

View more statistics

CORE (COnnecting REpositories)