Interpreting concept models for effective human-machine collaboration

Furby, Jack 2025. Interpreting concept models for effective human-machine collaboration. PhD Thesis, Cardiff University.

Item availability restricted.

Preview	PDF - Accepted Post-Print Version Available under License Creative Commons Attribution Non-commercial No Derivatives. Download (71MB) \| Preview
	PDF (Cardiff University Electronic Publication Form) - Supplemental Material Restricted to Repository staff only Download (301kB) \| Request a copy

Abstract

Deep Neural Networks (DNNs) are often considered black boxes due to their opaque decision-making processes. Concept Bottleneck Models (CBMs) aim to overcome this by predicting human-defined concepts as an intermediate step before predicting task labels and thus enhancing the interpretability of DNNs. In a human-machine setting, greater interpretability enables humans to improve their understanding and build trust in a DNN. However, for interpretability to be meaningful, concept predictions must be grounded in semantically meaningful input features. For example, pixels representing a bone break should contribute to the corresponding concept. Existing literature suggests that CBMs often rely on irrelevant features or encode spurious correlations, leading us to question their interpretations. This thesis investigates how CBMs represent concepts and how dataset design and model training influence their interpretability. We evaluate the impact of different concept annotation configurations, emphasising the importance of dataset configuration. Using synthetic and real-world datasets, we demonstrate that CBMs can align concepts with semantically meaningful input features when trained appropriately. We analyse challenges w.r.t. concept correlation and input feature sensitivity, where correlated concepts in training data can lead to concept representations encoding extraneous information and increase concept sensitivity to unrelated input features. To address the challenge of dataset design, we propose best practices for training CBMs that ensure concepts are grounded in semantically meaningful features, minimise leakage and maintain predictable concept accuracy under input feature manipulations. We conducted the first human studies using CBMs to evaluate human interaction in collaborative task settings. Our findings show that CBMs improve interpretability compared to standard DNNs, leading to increased human-machine alignment. However, this increased alignment did not translate to a significant increase in task accuracy. Understanding the model’s decision-making process required multiple interactions, and misalignment between the model’s and human decision-making processes could undermine interpretability and model effectiveness in a collaborative setting.

Item Type:	Thesis (PhD)
Date Type:	Completion
Status:	Unpublished
Schools:	Schools > Computer Science & Informatics
Subjects:	Q Science > QA Mathematics Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Funders:	EPSRC and IBM UK via an ICASE award
Date of First Compliant Deposit:	25 September 2025
Date of Acceptance:	8 September 2025
Last Modified:	25 Sep 2025 15:26
URI:	https://orca.cardiff.ac.uk/id/eprint/181338

Actions (repository staff only)

Edit Item

Download Statistics

Downloads

Downloads per month over past year

View more statistics

CORE (COnnecting REpositories)