Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Promoting counterfactual robustness through diversity

Leofante, Francesco and Potyka, Nico 2024. Promoting counterfactual robustness through diversity. Presented at: The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24), 20-27 February 2024. Proceedings of the AAAI Conference on Artificial Intelligence: AAAI-24 Special Track Safe, Robust and Responsible AI Track. , vol.38 (19) Association for the Advancement of Artifcial Intelligence, pp. 21322-21330. 10.1609/aaai.v38i19.30127

[thumbnail of 30127-Article Text-34181-1-2-20240324 (1).pdf]
Preview
PDF - Published Version
Download (179kB) | Preview

Abstract

Counterfactual explanations shed light on the decisions of black-box models by explaining how an input can be altered to obtain a favourable decision from the model (e.g., when a loan application has been rejected). However, as noted recently, counterfactual explainers may lack robustness in the sense that a minor change in the input can cause a major change in the explanation. This can cause confusion on the user side and open the door for adversarial attacks. In this paper, we study some sources of non-robustness. While there are fundamental reasons for why an explainer that returns a single counterfactual cannot be robust in all instances, we show that some interesting robustness guarantees can be given by reporting multiple rather than a single counterfactual. Unfortunately, the number of counterfactuals that need to be reported for the theoretical guarantees to hold can be prohibitively large. We therefore propose an approximation algorithm that uses a diversity criterion to select a feasible number of most relevant explanations and study its robustness empirically. Our experiments indicate that our method improves the state-of-the-art in generating robust explanations, while maintaining other desirable properties and providing competitive computational performance.

Item Type: Conference or Workshop Item (Paper)
Date Type: Publication
Status: Published
Schools: Computer Science & Informatics
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Publisher: Association for the Advancement of Artifcial Intelligence
ISBN: 978-1-57735-887-9
Date of First Compliant Deposit: 15 May 2024
Date of Acceptance: 9 December 2023
Last Modified: 17 Jun 2024 01:30
URI: https://orca.cardiff.ac.uk/id/eprint/168929

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics