Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Imbalanced text classification: A term weighting approach

Liu, Ying ORCID: https://orcid.org/0000-0001-9319-5940, Loh, Han Tong and Sun, Aixin 2009. Imbalanced text classification: A term weighting approach. Expert Systems with Applications 36 (1) , pp. 690-701. 10.1016/j.eswa.2007.10.042

Full text not available from this repository.

Abstract

The natural distribution of textual data used in text classification is often imbalanced. Categories with fewer examples are under-represented and their classifiers often perform far below satisfactory. We tackle this problem using a simple probability based term weighting scheme to better distinguish documents in minor categories. This new scheme directly utilizes two critical information ratios, i.e. relevance indicators. Such relevance indicators are nicely supported by probability estimates which embody the category membership. Our experimental study using both Support Vector Machines and Naïve Bayes classifiers and extensive comparison with other classic weighting schemes over two benchmarking data sets, including Reuters-21578, shows significant improvement for minor categories, while the performance for major categories are not jeopardized. Our approach has suggested a simple and effective solution to boost the performance of text classification over skewed data sets.

Item Type: Article
Date Type: Publication
Status: Published
Schools: Engineering
Subjects: T Technology > TA Engineering (General). Civil engineering (General)
Uncontrolled Keywords: Text classification; Imbalanced data; Term weighting scheme
ISSN: 0957-4174
Last Modified: 24 Oct 2022 12:02
URI: https://orca.cardiff.ac.uk/id/eprint/50167

Citation Data

Cited 211 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item Edit Item