A comparative study of ensemble learning techniques for imbalanced classification problems

Adefemi Ayodele *

Department of Computer Science and Digital Technologies, University of East London, London, England, United Kingdom.
 
Research Article
World Journal of Advanced Research and Reviews, 2023, 19(01), 1633-1643
Article DOI: 10.30574/wjarr.2023.19.1.1202
 
Publication history: 
Received on 04 June 2023; revised on 19 July 2023; accepted on 21 July 2023
 
Abstract: 
Imbalanced classification problems frequently arise in critical domains such as fraud detection, medical diagnosis, cybersecurity, and anomaly detection, where the minority class often carries disproportionate importance despite its scarcity. Traditional machine learning algorithms tend to favour the majority class, leading to suboptimal performance and costly misclassifications in minority class detection. This study evaluates ensemble learning techniques—including Bagging, Boosting, Random Forest, EasyEnsemble, and BalancedRandomForest—for their effectiveness in managing class imbalance. Using several real-world benchmark datasets with varying imbalance ratios and feature complexities, the methods are rigorously assessed using metrics tailored to imbalanced scenarios, including F1-score, precision-recall area under the curve (PR-AUC), and geometric mean (G-mean). Results indicate that boosting-based methods, particularly Gradient Boosting Machines (GBM), consistently excel across most datasets, especially in terms of PR-AUC and G-mean. However, certain datasets with extreme imbalance or high feature dimensionality saw stronger performance from BalancedRandomForest. These findings underscore that the optimal ensemble method is highly dependent on specific dataset attributes and operational constraints. This analysis offers practical insights into aligning ensemble strategies with real-world requirements, guiding researchers and practitioners toward more robust and accurate models in imbalanced classification contexts. 
 
Keywords: 
Imbalance; Ensemble; Bagging; Boosting; Resampling; Benchmark
 
Full text article in PDF: 
Share this