IEEE Access (Jan 2025)

LASSO-mCGA: Machine Learning and Modified Compact Genetic Algorithm-Based Biomarker Selection for Breast Cancer Subtype Classification

  • Nimisha Ghosh,
  • Sankar Kumar Mridha,
  • Rourab Paul

DOI
https://doi.org/10.1109/ACCESS.2025.3532361
Journal volume & issue
Vol. 13
pp. 17673 – 17682

Abstract

Read online

Breast cancer is the most common cancer type among females and is one of the leading causes of death worldwide. Being a heterogeneous disease, subtyping breast cancer plays a vital role in its treatment. In this regard, gene expression plays an important role. Thus, in this work gene expression data is used to identify the most significant gene biomarkers. The identified biomarkers are highly associated with each breast cancer subtype such as Luminal A, Luminal B, HER2-Enriched and Basal-Like. To identify such biomarkers, initially LASSO in association with four machine learning models such as Random Forest (RF), Support Vector Machine (SVM), K-Nearest Neighbours (KNN) and Naive Bayes (NB) are applied on the dataset to find the initial reduced set of genes as well as the best learning model based on classification accuracy; SVM in this case. Thereafter, Modified Compact Genetic Algorithm (mCGA) is performed to identify the final set of genes as biomarkers for each specific subtype. Experimental results suggest that our proposed method assesses AUC-ROC values of 0.9878 and 0.97311 for LumA and LumB and 1 for Basal and HER2 subtypes. To validate the biological significance of the identified biomarkers, KEGG pathway and GO enrichment analysis are carried out.

Keywords