Improving the robustness and stability of a machine learning model for breast cancer prognosis through the use of multi-modal classifiers

Nikhilanand Arya; Sriparna Saha; Archana Mathur; Snehanshu Saha

doi:10.1038/s41598-023-30143-8

Scientific Reports (Mar 2023)

Improving the robustness and stability of a machine learning model for breast cancer prognosis through the use of multi-modal classifiers

Nikhilanand Arya,
Sriparna Saha,
Archana Mathur,
Snehanshu Saha

Affiliations

Nikhilanand Arya: Department of Computer Science & Engineering, Indian Institute of Technology
Sriparna Saha: Department of Computer Science & Engineering, Indian Institute of Technology
Archana Mathur: Department of Information Science & Engineering, Nitte Meenkashi Institute of Technology
Snehanshu Saha: APPCAIR & CSIS, Birla Institute of Technology and Science

DOI: https://doi.org/10.1038/s41598-023-30143-8
Journal volume & issue: Vol. 13, no. 1
pp. 1 – 10

Abstract

Read online

Abstract Breast cancer is a deadly disease with a high mortality rate among PAN cancers. The advancements in biomedical information retrieval techniques have been beneficial in developing early prognosis and diagnosis systems for cancer patients. These systems provide the oncologist with plenty of information from several modalities to make the correct and feasible treatment plan for breast cancer patients and protect them from unnecessary therapies and their toxic side effects. The cancer patient’s related information can be collected using various modalities like clinical, copy number variation, DNA-methylation, microRNA sequencing, gene expression, and histopathological whole slide images. High dimensionality and heterogeneity in these modalities demand the development of some intelligent systems to understand related features to the prognosis and diagnosis of diseases and make correct predictions. In this work, we have studied some end-to-end systems having two main components : (a) dimensionality reduction techniques applied to original features from different modalities and (b) classification techniques applied to the fusion of reduced feature vectors from different modalities for automatic predictions of breast cancer patients into two categories: short-time and long-time survivors. Principal component analysis (PCA) and variational auto-encoders (VAEs) are used as the dimensionality reduction techniques, followed by support vector machines (SVM) or random forest as the machine learning classifiers. The study utilizes raw, PCA, and VAE extracted features of the TCGA-BRCA dataset from six different modalities as input to the machine learning classifiers. We conclude this study by suggesting that adding more modalities to the classifiers provides complementary information to the classifier and increases the stability and robustness of the classifiers. In this study, the multimodal classifiers have not been validated on primary data prospectively.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal