Explainable machine learning for multivariate models

Zhang, Mengqi 2025. Explainable machine learning for multivariate models. PhD Thesis, Cardiff University.
Item availability restricted.

	PDF - Accepted Post-Print Version Restricted to Repository staff only until 22 August 2026 due to copyright restrictions. Download (6MB) \| Request a copy
	PDF (Cardiff University Electronic Publication Form) - Supplemental Material Restricted to Repository staff only Download (1MB) \| Request a copy

Abstract

Machine learning and deep learning models have become increasingly popular over the past decade. However, they are often regarded as ”black boxes,” as the mechanisms behind their predictions remain unknown. This lack of transparency has raised concerns, limiting their usage and making users concerned about their trustworthiness. Various XAI methods are emerging to address this problem by explaining complex models from different perspectives, such as representative samples, summarised rules, and feature importance scores. However, several recent studies indicate that these explanation methods can generate biased results when encountering correlated features, especially for those methods involving permutation techniques. One potential reason is that these methods assume feature independence; their explanations are generated through marginal distribution, which inaccurately represents the data distribution and leads to biased estimated feature effects. In this thesis, we proposed several explanation methods that can be less affected by correlated features, aim to generate reliable feature importance scores and enhance the trustworthiness of the complex models. We first proposed an explanation method specifically for RBF kernel SVM models. This method generates robust global feature importance scores less affected by class-irrelevant features. Secondly, we introduced a framework that provides feature importance scores for non-linear SVM models. Our proposed methods demonstrate reduced sensitivity to noise and correlated features compared to several state-of-the-art explanation methods. Finally, our third contribution is introducing a model-agnostic explanation method that can be applied to different models. This approach derives feature importance measures by assessing the conditional expected predictions of the model, allowing for the efficient identification of informative features while reducing the influence of irrelevant features, such as suppressor variables. The proposed methods are validated using synthetic and open-source datasets, demonstrating their efficiency in generating reliable feature importance scores.

Item Type:	Thesis (PhD)
Date Type:	Completion
Status:	Unpublished
Schools:	Schools > Computer Science & Informatics
Subjects:	Q Science > QA Mathematics > QA75 Electronic computers. Computer science Q Science > QA Mathematics > QA76 Computer software
Funders:	China Scholarship Council
Date of First Compliant Deposit:	22 August 2025
Last Modified:	27 Aug 2025 11:23
URI:	https://orca.cardiff.ac.uk/id/eprint/180607

Actions (repository staff only)

Edit Item

Download Statistics

Downloads

Downloads per month over past year

View more statistics

CORE (COnnecting REpositories)