Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

On the predictive potential of kernel principal components

Jones, Benjamin, Artemiou, Andreas and Li, Bing 2020. On the predictive potential of kernel principal components. Electronic Journal of Statistics 14 (1) , pp. 1-23. 10.1214/19-EJS1655

PDF - Accepted Post-Print Version
Available under License Creative Commons Attribution.

Download (421kB) | Preview


We give a probabilistic analysis of a phenomenon in statistics which, until recently, has not received a convincing explanation. This phenomenon is that the leading principal components tend to possess more predictive power for a response variable than lower-ranking ones despite the procedure being unsupervised. Our result, in its most general form, shows that the phenomenon goes far beyond the context of linear regression and classical principal components --- if an arbitrary distribution for the predictor $X$ and an arbitrary conditional distribution for $Y \vert X$ are chosen then any measureable function $g(Y)$, subject to a mild condition, tends to be more correlated with the higher-ranking kernel principal components than with the lower-ranking ones. The ``arbitrariness'' is formulated in terms of unitary invariance then the tendency is explicitly quantified by exploring how unitary invariance relates to the Cauchy distribution. The most general results, for technical reasons, are shown for the case where the kernel space is finite dimensional. The occurency of this tendency in real world databases is also investigated to show that our results are consistent with observation.

Item Type: Article
Date Type: Publication
Status: Published
Schools: Mathematics
Subjects: Q Science > QA Mathematics
Publisher: Institute of Mathematical Statistics
ISSN: 1935-7524
Date of First Compliant Deposit: 9 December 2019
Date of Acceptance: 7 December 2019
Last Modified: 12 Mar 2020 10:02

Actions (repository staff only)

Edit Item Edit Item


Downloads per month over past year

View more statistics