Zhou, Hongfang, Wang, Yating, Cui, Shimiao, Tong, Jiahao, Yang, Xiuhong and Karakus, Oktay ORCID: https://orcid.org/0000-0001-8009-9319
2026.
An oversampling method for addressing imbalanced data utilizing K-means clustering and membership-based data partitioning.
Engineering Applications of Artificial Intelligence
167
, 113799.
10.1016/j.engappai.2026.113799
|
Abstract
Class imbalance poses a critical challenge in real-world classification tasks, such as medical diagnosis, fraud detection, and fault monitoring, where accurate recognition of minority instances is vital. Oversampling is an effective approach to address this issue, but most existing methods tend to generate noisy synthetic samples in certain distributions, which degrades classification performance. This paper proposes a novel oversampling method called KM-MSMOTE (K-means and Membership-guided Synthetic Minority Oversampling Technique). The proposed method integrates K-means clustering with membership-based region partitioning to enhance the quality and representativeness of synthetic samples. Specifically, KM-MSMOTE filters clusters based on class dominance, adaptively assigns sampling weights, and divides each cluster into safe, overlapping, and noisy regions to control the generation of synthetic samples. Comprehensive experiments on 24 real-world datasets demonstrate that KM-MSMOTE consistently outperforms several state-of-the-art oversampling methods, achieving average improvements of 6.4–6.8 % across multiple evaluation metrics when combined with Support Vector Machine (SVM), Random Forest (RF), and K-Nearest Neighbors (KNN). Statistical analysis through Friedman and Nemenyi Post-Hoc tests confirms the significance of these improvements. Moreover, KM-MSMOTE demonstrates its superior robustness in challenging scenarios, achieving an average AUC improvement of 4.1 % under noise interference and 4.7 % under class overlap conditions compared to baseline methods. These results suggest that KM-MSMOTE provides an effective solution for engineering applications requiring reliable classification in imbalanced, noisy data environments.
| Item Type: | Article |
|---|---|
| Date Type: | Publication |
| Status: | Published |
| Schools: | Schools > Computer Science & Informatics |
| Publisher: | Elsevier |
| ISSN: | 0952-1976 |
| Date of Acceptance: | 7 January 2026 |
| Last Modified: | 19 Jan 2026 11:00 |
| URI: | https://orca.cardiff.ac.uk/id/eprint/183983 |
Actions (repository staff only)
![]() |
Edit Item |





Altmetric
Altmetric