Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Dimension reduction methods for high-dimensional datasets

Randell, Hayley 2022. Dimension reduction methods for high-dimensional datasets. MPhil Thesis, Cardiff University.
Item availability restricted.

[thumbnail of Thesis_Hayley Randall 2022.pdf]
Preview
PDF - Accepted Post-Print Version
Download (1MB) | Preview
[thumbnail of Cardiff University Electronic Publication Form] PDF (Cardiff University Electronic Publication Form) - Supplemental Material
Restricted to Repository staff only

Download (118kB)

Abstract

In recent years computer power has increased massively which consequently has led to an increase in the size of data. The steep increase in size has led to a vast need for more modern ways of analysing this data. Classical methods for analysing data were intended for a low dimensional setting, hence an increasingly popular method of analysing large data is to perform a dimension reduction technique first to project the data into a lower dimension. A `good' dimension reduction technique accurately predicts the correct dimension reduction subspace, without having a sig- nificant impact on the computational efficiency of the calculations. There are many dimension reduction methods already developed but few have successfully achieved a high level of accuracy without sacrificing the computation time. Our aim is to develop a method that rivals previous methods with high accuracy and those which are efficient computationally. Another common drawback with classic methods is that not many are realistic options for data where the dimension size exceeds the sample size, many depend on calculating the inverse of the covariance matrix of the predictor variables which becomes singular as the dimension size surpasses the sample size. It has also been shown that many classic estimators of the central dimension reduction subspace do not remain consistent when the dimension size is larger than the sample size. There are two main contributions from this work, we have developed a dimension reduction method using Distance-Weighted Discrimination (DWD) which has increased accuracy compared with classic methods and is computationally faster than more recent methods. We have also developed a dimension reduction method which can tackle larger datasets without being restricted by the dimension, and further improved the computational efficiency compared with classic methods in the form of a feature partitioning algorithm.

Item Type: Thesis (MPhil)
Date Type: Completion
Status: Unpublished
Schools: Mathematics
Date of First Compliant Deposit: 6 October 2022
Last Modified: 06 May 2023 02:38
URI: https://orca.cardiff.ac.uk/id/eprint/152961

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics