IEEE Access (Jan 2024)

LightGBM: A Leading Force in Breast Cancer Diagnosis Through Machine Learning and Image Processing

  • Bassam M. Kanber,
  • Ahmad Al Smadi,
  • Naglaa F. Noaman,
  • Bo Liu,
  • Shuiping Gou,
  • Mutasem K. Alsmadi

DOI
https://doi.org/10.1109/ACCESS.2024.3375755
Journal volume & issue
Vol. 12
pp. 39811 – 39832

Abstract

Read online

The early diagnosis of breast cancer (BC), a prominent global cause of mortality, necessitates the development of innovative diagnostic strategies. This study leverages machine learning (ML) and advanced image processing techniques to analyze histopathology images, thereby augmenting the capabilities for BC diagnosis. A robust feature extraction (FE) pipeline is developed, integrating techniques such as color histogram analysis, contour FE, hu moments, and haralick texture features. Ten ML algorithms, including LightGBM (LGBM), CatBoost, and XGBoost, are systematically evaluated across varying magnifications of the BreakHis dataset to assess their diagnostic performance. The research introduces a novel approach by combining distinct FE techniques, enhancing the model’s ability to distinguish between benign and malignant tissues with exceptional accuracy. These integrated techniques significantly elevate BC diagnostic accuracy and reliability, holding the potential to positively impact patient outcomes and healthcare systems. Notably, the combination of the FE pipeline and LGBM achieves the highest accuracy, reported in two forms: before augmentation accuracies (0.9598 for $40 \times $ , 0.9516 for $100 \times $ , 0.9652 for $200 \times $ , 0.9535 for $400 \times $ , and 0.9570 for all magnifications combined) and after augmentation accuracies (0.9949 for $40 \times $ , 0.9870 for $100 \times $ , 0.9987 for $200 \times $ , and 0.9918 for $400 \times $ ) for the classification of magnification histopathological images. Moreover, the study highlights the crucial role of augmentation in further refining classification accuracy. Extending its applicability, the proposed method is also successfully applied to the classification of lung colon cancer images (LC25000 dataset), achieving an impressive accuracy of 0.9983. The model demonstrates its effectiveness and adaptability as a compelling method for histopathological image classification. This research contributes to the evolving field of BC diagnostics, offering a framework for robust and accurate ML-based diagnostic tools that may revolutionize cancer diagnosis and enhance patient care.

Keywords