An oversampling method for addressing imbalanced data utilizing K-means clustering and membership-based data partitioning

Zhou, Hongfang, Wang, Yating, Cui, Shimiao, Tong, Jiahao, Yang, Xiuhong and Karakus, Oktay

2026. An oversampling method for addressing imbalanced data utilizing K-means clustering and membership-based data partitioning. Engineering Applications of Artificial Intelligence 167 , 113799. 10.1016/j.engappai.2026.113799

Full text not available from this repository.

Official URL: https://doi.org/10.1016/j.engappai.2026.113799

Abstract

Class imbalance poses a critical challenge in real-world classification tasks, such as medical diagnosis, fraud detection, and fault monitoring, where accurate recognition of minority instances is vital. Oversampling is an effective approach to address this issue, but most existing methods tend to generate noisy synthetic samples in certain distributions, which degrades classification performance. This paper proposes a novel oversampling method called KM-MSMOTE (K-means and Membership-guided Synthetic Minority Oversampling Technique). The proposed method integrates K-means clustering with membership-based region partitioning to enhance the quality and representativeness of synthetic samples. Specifically, KM-MSMOTE filters clusters based on class dominance, adaptively assigns sampling weights, and divides each cluster into safe, overlapping, and noisy regions to control the generation of synthetic samples. Comprehensive experiments on 24 real-world datasets demonstrate that KM-MSMOTE consistently outperforms several state-of-the-art oversampling methods, achieving average improvements of 6.4–6.8 % across multiple evaluation metrics when combined with Support Vector Machine (SVM), Random Forest (RF), and K-Nearest Neighbors (KNN). Statistical analysis through Friedman and Nemenyi Post-Hoc tests confirms the significance of these improvements. Moreover, KM-MSMOTE demonstrates its superior robustness in challenging scenarios, achieving an average AUC improvement of 4.1 % under noise interference and 4.7 % under class overlap conditions compared to baseline methods. These results suggest that KM-MSMOTE provides an effective solution for engineering applications requiring reliable classification in imbalanced, noisy data environments.

Item Type:	Article
Date Type:	Publication
Status:	Published
Schools:	Schools > Computer Science & Informatics
Publisher:	Elsevier
ISSN:	0952-1976
Date of Acceptance:	7 January 2026
Last Modified:	19 Jan 2026 11:00
URI:	https://orca.cardiff.ac.uk/id/eprint/183983

Actions (repository staff only)

Edit Item

Altmetric

Dimensions

CORE (COnnecting REpositories)