Leofante, Francesco and Potyka, Nico 2024. Promoting counterfactual robustness through diversity. Presented at: The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24), 20-27 February 2024. Proceedings of the AAAI Conference on Artificial Intelligence: AAAI-24 Special Track Safe, Robust and Responsible AI Track. , vol.38 (19) Association for the Advancement of Artifcial Intelligence, pp. 21322-21330. 10.1609/aaai.v38i19.30127 |
Preview |
PDF
- Published Version
Download (179kB) | Preview |
Abstract
Counterfactual explanations shed light on the decisions of black-box models by explaining how an input can be altered to obtain a favourable decision from the model (e.g., when a loan application has been rejected). However, as noted recently, counterfactual explainers may lack robustness in the sense that a minor change in the input can cause a major change in the explanation. This can cause confusion on the user side and open the door for adversarial attacks. In this paper, we study some sources of non-robustness. While there are fundamental reasons for why an explainer that returns a single counterfactual cannot be robust in all instances, we show that some interesting robustness guarantees can be given by reporting multiple rather than a single counterfactual. Unfortunately, the number of counterfactuals that need to be reported for the theoretical guarantees to hold can be prohibitively large. We therefore propose an approximation algorithm that uses a diversity criterion to select a feasible number of most relevant explanations and study its robustness empirically. Our experiments indicate that our method improves the state-of-the-art in generating robust explanations, while maintaining other desirable properties and providing competitive computational performance.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Date Type: | Publication |
Status: | Published |
Schools: | Computer Science & Informatics |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Publisher: | Association for the Advancement of Artifcial Intelligence |
ISBN: | 978-1-57735-887-9 |
Date of First Compliant Deposit: | 15 May 2024 |
Date of Acceptance: | 9 December 2023 |
Last Modified: | 17 Jun 2024 01:30 |
URI: | https://orca.cardiff.ac.uk/id/eprint/168929 |
Actions (repository staff only)
Edit Item |