Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Distance measures and whitening procedures for high dimensional data

O'Riordan, Emily 2023. Distance measures and whitening procedures for high dimensional data. PhD Thesis, Cardiff University.
Item availability restricted.

[thumbnail of Thesis 2023 - Emily O'Riordan.pdf]
Preview
PDF - Accepted Post-Print Version
Download (7MB) | Preview
[thumbnail of Cardiff University Electronic Publication Form] PDF (Cardiff University Electronic Publication Form) - Supplemental Material
Restricted to Repository staff only

Download (307kB)

Abstract

The need to effectively analyse high dimensional data is increasingly crucial to many fields as data collection and storage capabilities continue to grow. Working with high dimensional data is fraught with difficulties, making many data analysis methods inadvisable, unstable or entirely unavailable. The Mahalanobis distance and data whitening are two methods that are integral to multivariate data analysis. These methods are reliant on the inverse of the covariance matrix, which is often non-existent or unstable in high dimensions. The methods that are currently used to circumvent singularity in the covariance matrix often impose structural assumptions on the data, which are not always appropriate or known. In this thesis, three novel methods are proposed. Two of these methods are distance measures which measure the proximity of a point x to a set of points X. The simplicial distances find the average volume of all k-dimensional simplices between x and vertices of X. The minimal-variance distances aim to minimize the variance of the distances produced, while adhering to a constraint ensuring similar behaviour to the Mahalanobis distance. Finally, the minimal-variance whitening method is detailed. This is a method of data whitening, and is constructed by minimizing the total variation of the transformed data subject to a constraint. All of these novel methods are shown to behave similarly to the Mahalanobis distances and data whitening methods that are used for full-rank data. Furthermore, unlike the methods that rely on the inverse covariance matrix, these new methods are well-defined for degenerate data and do not impose structural assumptions. This thesis explores the aims, constructions and limitations of these new methods, and offers many empirical examples and comparisons of their performances when used with high dimensional data.

Item Type: Thesis (PhD)
Date Type: Completion
Status: Unpublished
Schools: Mathematics
Date of First Compliant Deposit: 1 February 2023
Last Modified: 01 Feb 2023 10:20
URI: https://orca.cardiff.ac.uk/id/eprint/156406

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics