Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Feature selection using Joint Mutual Information Maximisation

Bennasar, Mohamed, Hicks, Yulia ORCID: https://orcid.org/0000-0002-7179-4587 and Setchi, Rossitza ORCID: https://orcid.org/0000-0002-7207-6544 2015. Feature selection using Joint Mutual Information Maximisation. Expert Systems with Applications 42 (22) , pp. 8520-8532. 10.1016/j.eswa.2015.07.007

[thumbnail of 1-s2.0-S0957417415004674-main.pdf]
Preview
PDF - Published Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Download (1MB) | Preview

Abstract

Feature selection is used in many application areas relevant to expert and intelligent systems, such as data mining and machine learning, image processing, anomaly detection, bioinformatics and natural language processing. Feature selection based on information theory is a popular approach due its computational efficiency, scalability in terms of the dataset dimensionality, and independence from the classifier. Common drawbacks of this approach are the lack of information about the interaction between the features and the classifier, and the selection of redundant and irrelevant features. The latter is due to the limitations of the employed goal functions leading to overestimation of the feature significance. To address this problem, this article introduces two new nonlinear feature selection methods, namely Joint Mutual Information Maximisation (JMIM) and Normalised Joint Mutual Information Maximisation (NJMIM); both these methods use mutual information and the ‘maximum of the minimum’ criterion, which alleviates the problem of overestimation of the feature significance as demonstrated both theoretically and experimentally. The proposed methods are compared using eleven publically available datasets with five competing methods. The results demonstrate that the JMIM method outperforms the other methods on most tested public datasets, reducing the relative average classification error by almost 6% in comparison to the next best performing method. The statistical significance of the results is confirmed by the ANOVA test. Moreover, this method produces the best trade-off between accuracy and stability

Item Type: Article
Date Type: Publication
Status: Published
Schools: Engineering
Subjects: T Technology > T Technology (General)
Additional Information: This is an open access article under the CCBY-NC-ND 4.0 International license.
Publisher: Elsevier
ISSN: 0957-4174
Date of First Compliant Deposit: 30 March 2016
Date of Acceptance: 4 July 2015
Last Modified: 06 Jul 2023 18:05
URI: https://orca.cardiff.ac.uk/id/eprint/76215

Citation Data

Cited 396 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics