Furby, Jack
2025.
Interpreting concept models for effective human-machine
collaboration.
PhD Thesis,
Cardiff University.
![]() Item availability restricted. |
Preview |
PDF
- Accepted Post-Print Version
Available under License Creative Commons Attribution Non-commercial No Derivatives. Download (71MB) | Preview |
![]() |
PDF (Cardiff University Electronic Publication Form)
- Supplemental Material
Restricted to Repository staff only Download (301kB) | Request a copy |
Abstract
Deep Neural Networks (DNNs) are often considered black boxes due to their opaque decision-making processes. Concept Bottleneck Models (CBMs) aim to overcome this by predicting human-defined concepts as an intermediate step before predicting task labels and thus enhancing the interpretability of DNNs. In a human-machine setting, greater interpretability enables humans to improve their understanding and build trust in a DNN. However, for interpretability to be meaningful, concept predictions must be grounded in semantically meaningful input features. For example, pixels representing a bone break should contribute to the corresponding concept. Existing literature suggests that CBMs often rely on irrelevant features or encode spurious correlations, leading us to question their interpretations. This thesis investigates how CBMs represent concepts and how dataset design and model training influence their interpretability. We evaluate the impact of different concept annotation configurations, emphasising the importance of dataset configuration. Using synthetic and real-world datasets, we demonstrate that CBMs can align concepts with semantically meaningful input features when trained appropriately. We analyse challenges w.r.t. concept correlation and input feature sensitivity, where correlated concepts in training data can lead to concept representations encoding extraneous information and increase concept sensitivity to unrelated input features. To address the challenge of dataset design, we propose best practices for training CBMs that ensure concepts are grounded in semantically meaningful features, minimise leakage and maintain predictable concept accuracy under input feature manipulations. We conducted the first human studies using CBMs to evaluate human interaction in collaborative task settings. Our findings show that CBMs improve interpretability compared to standard DNNs, leading to increased human-machine alignment. However, this increased alignment did not translate to a significant increase in task accuracy. Understanding the model’s decision-making process required multiple interactions, and misalignment between the model’s and human decision-making processes could undermine interpretability and model effectiveness in a collaborative setting.
Item Type: | Thesis (PhD) |
---|---|
Date Type: | Completion |
Status: | Unpublished |
Schools: | Schools > Computer Science & Informatics |
Subjects: | Q Science > QA Mathematics Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Funders: | EPSRC and IBM UK via an ICASE award |
Date of First Compliant Deposit: | 25 September 2025 |
Date of Acceptance: | 8 September 2025 |
Last Modified: | 25 Sep 2025 15:26 |
URI: | https://orca.cardiff.ac.uk/id/eprint/181338 |
Actions (repository staff only)
![]() |
Edit Item |