Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Bayesian networks for classification, clustering, and high-dimensional data visualisation

Ruz Heredia, Gonzalo Andres 2008. Bayesian networks for classification, clustering, and high-dimensional data visualisation. PhD Thesis, Cardiff University.

[thumbnail of U585111.pdf] PDF - Accepted Post-Print Version
Download (6MB)

Abstract

This thesis presents new developments for a particular class of Bayesian networks which are limited in the number of parent nodes that each node in the network can have. This restriction yields structures which have low complexity (number of edges), thus enabling the formulation of optimal learning algorithms for Bayesian networks from data. The new developments are focused on three topics: classification, clustering, and high-dimensional data visualisation (topographic map formation). For classification purposes, a new learning algorithm for Bayesian networks is introduced which generates simple Bayesian network classifiers. This approach creates a completely new class of networks which previously was limited mostly to two well known models, the naive Bayesian (NB) classifier and the Tree Augmented Naive Bayes (TAN) classifier. The proposed learning algorithm enhances the NB model by adding a Bayesian monitoring system. Therefore, the complexity of the resulting network is determined according to the input data yielding structures which model the data distribution in a more realistic way which improves the classification performance. Research on Bayesian networks for clustering has not been as popular as for classification tasks. A new unsupervised learning algorithm for three types of Bayesian network classifiers, which enables them to carry out clustering tasks, is introduced. The resulting models can perform cluster assignments in a probabilistic way using the posterior probability of a data point belonging to one of the clusters. A key characteristic of the proposed clustering models, which traditional clustering techniques do not have, is the ability to show the probabilistic dependencies amongst the variables for each cluster. This feature enables a better understanding of each cluster. The final part of this thesis introduces one of the first developments for Bayesian networks to perform topographic mapping. A new unsupervised learning algorithm for the NB model is presented which enables the projection of high-dimensional data into a two-dimensional space for visualisation purposes. The Bayesian network formalism of the model allows the learning algorithm to generate a density model of the input data and the presence of a cost function to monitor the convergence during the training process. These important features are limitations which other mapping techniques have and which have been overcome in this research.

Item Type: Thesis (PhD)
Status: Unpublished
Schools: Engineering
Subjects: T Technology > TA Engineering (General). Civil engineering (General)
ISBN: 9781303213250
Date of First Compliant Deposit: 30 March 2016
Last Modified: 25 Oct 2017 14:31
URI: https://orca.cardiff.ac.uk/id/eprint/54722

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics