Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Data mining of range-based classification rules for data characterization

Tziatzios, Achilleas 2014. Data mining of range-based classification rules for data characterization. PhD Thesis, Cardiff University.
Item availability restricted.

[thumbnail of 2014tziatziosaphd.pdf]
Preview
PDF - Accepted Post-Print Version
Download (1MB) | Preview
[thumbnail of C__Users_scmhmw_Desktop_tziatziosa.pdf] PDF - Supplemental Material
Restricted to Repository staff only

Download (73kB)

Abstract

Advances in data gathering have led to the creation of very large collections across different fields like industrial site sensor measurements or the account statuses of a financial institution's clients. The ability to learn classification rules, rules that associate specific attribute values with a specific class label, from this data is important and useful in a range of applications. While many methods to facilitate this task have been proposed, existing work has focused on categorical datasets and very few solutions that can derive classification rules of associated continuous ranges (numerical intervals) have been developed. Furthermore, these solutions have solely relied in classification performance as a means of evaluation and therefore focus on the mining of mutually exclusive classification rules and the correct prediction of the most dominant class values. As a result existing solutions demonstrate only limited utility when applied for data characterization tasks. This thesis proposes a method that derives range-based classification rules from numerical data inspired by classification association rule mining. The presented method searches for associated numerical ranges that have a class value as their consequent and meet a set of user defined criteria. A new interestingness measure is proposed for evaluating the density of range-based rules and four heuristic based approaches are presented for targeting different sets of rules. Extensive experiments demonstrate the effectiveness of the new algorithm for classification tasks when compared to existing solutions and its utility as a solution for data characterization.

Item Type: Thesis (PhD)
Status: Unpublished
Schools: Computer Science & Informatics
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Uncontrolled Keywords: Data mining; Classification rules; Data characterization; Continuous data
Date of First Compliant Deposit: 30 March 2016
Last Modified: 19 Mar 2016 23:47
URI: https://orca.cardiff.ac.uk/id/eprint/65902

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics