Egyptian Informatics Journal (Jun 2024)

Addressing label noise in leukemia image classification using small loss approach and pLOF with weighted-average ensemble

  • Md. Tarek Aziz,
  • S.M. Hasan Mahmud,
  • Kah Ong Michael Goh,
  • Dip Nandi

Journal volume & issue
Vol. 26
p. 100479

Abstract

Read online

Machine learning (ML) and deep learning (DL) models have been extensively explored for the early diagnosis of various cancer diseases, including Leukemia, with many of them achieving significant performance improvements comparable to those of human experts. However, challenges like limited image data, inaccurate annotations, and prediction reliability still hinder their broad implementation to establish a trustworthy computer-aided diagnosis (CAD) system. This paper introduces a novel weighted-average ensemble model for classifying Acute Lymphoblastic Leukemia, along with a reliable Computer-Aided Diagnosis (CAD) system that combines the strengths of both ML and DL approaches. Initially, a variety of filtering methods are extensively analyzed to determine the most suitable image representation, with subsequent data augmentation techniques to expand the training data. Second, a modified VGG-19 model was proposed with fine-tuning that was utilized as a feature extractor to extract meaningful features from the training samples. Third, A small-loss approach and probabilistic local outlier factor (pLOF) have been developed on the extracted features to address the label noise issue. Fourth, we proposed an weighted-average ensemble model based on the top five models as base learners, with weights calculated based on their model uncertainty to ensure reliable predictions. Fifth, we calculated Shapley values based on cooperative game theory and performed feature selection with different feature combinations to determine the optimal number of features using SHAP. Finally, we integrate these strategies to develop an interpretable CAD system. This system not only predicts the disease but also generates Grad-CAM images to visualize potential affected areas, enhancing both clarity and diagnostic insight. All of our code is provided in the following repository: https://github.com/taareek/leukemia-classification

Keywords