Performance Assessment of Ensemble-Tree Learning Models on Breast Cancer Dataset

O. I. ALABI; O. J. FATABO; S. C. OKENU; T. A. ONIMISI; G. O. OYERINDE; I. J. UDOH; A. I. OZOEZE; A. C. EGBA

doi:10.34874/IMIST.PRSM/jis-v23i1.41823

Journal of Information Sciences (Jul 2024)

Performance Assessment of Ensemble-Tree Learning Models on Breast Cancer Dataset

O. I. ALABI,
O. J. FATABO,
S. C. OKENU,
T. A. ONIMISI,
G. O. OYERINDE,
I. J. UDOH,
A. I. OZOEZE,
A. C. EGBA

Affiliations

O. I. ALABI: Sheda Science and Technology Complex, F.C.T, Nigeria
O. J. FATABO: Sheda Science and Technology Complex, F.C.T, Nigeria
S. C. OKENU: Sheda Science and Technology Complex, F.C.T, Nigeria
T. A. ONIMISI: University of Aberdeen, Scotland
G. O. OYERINDE: Sheda Science and Technology Complex, F.C.T, Nigeria
I. J. UDOH: Sheda Science and Technology Complex, F.C.T, Nigeria
A. I. OZOEZE: Sheda Science and Technology Complex, F.C.T, Nigeria
A. C. EGBA: Sheda Science and Technology Complex, F.C.T, Nigeria

DOI: https://doi.org/10.34874/IMIST.PRSM/jis-v23i1.41823
Journal volume & issue: Vol. 23, no. 1

Abstract

Read online

Advancements of feature extraction enable the collection of prognostic data values which can be used to distinguish between benign and malignant tumours. While single learning models are capable of making predictions, combining weak learners to form an ensemble can improve predictive performance. This study evaluates and compares the performance of a few selected ensemble-tree machine learning models as applied to a Wisconsin Diagnostic breast cancer (WDBC) dataset. The dataset is split, producing a 60% training and 40% test division set. Random Forest classifier, Extremely Randomized Trees classifier, Gradient Boosting machine classifier and Extreme Gradient Boosting classifier were initialized with 3 weak learners and fit to the training set, with subsequent predictions made on the test set. Evaluation metrics used include Accuracy, Area under Receiver Operating Characteristic curves (AUROC), Precision-Recall curves and F2 scores followed by a Stratified 5-fold cross-validation procedure. Taking Precision and Recall into higher consideration, Extreme Gradient Boosting classifier and Extremely Randomized Trees classifier produced better performances with an average accuracy of 0.9386 and 0.9460 respectively. Overall, the Extremely Randomized Trees classifier outperforms the rest of the models with an average F2 score of 0.4232. Keywords: Breast cancer; Classification models; Tree-based Ensemble; Supervised learning

Published in Journal of Information Sciences

ISSN: 1113-4844 (Print); 2820-6894 (Online)
Publisher: Ecole des Sciences de l'Information
Country of publisher: Morocco
LCC subjects: Bibliography. Library science. Information resources
Website: https://revues.imist.ma/index.php/JIS/

About the journal