Machine Learning Model for Multiomics Biomarkers Identification for Menopause Status in Breast Cancer

Firas Alghanim; Ibrahim Al-Hurani; Hazem Qattous; Abdullah Al-Refai; Osamah Batiha; Abedalrhman Alkhateeb; Salama Ikki

doi:10.3390/a17010013

Algorithms (Dec 2023)

Machine Learning Model for Multiomics Biomarkers Identification for Menopause Status in Breast Cancer

Firas Alghanim,
Ibrahim Al-Hurani,
Hazem Qattous,
Abdullah Al-Refai,
Osamah Batiha,
Abedalrhman Alkhateeb,
Salama Ikki

Affiliations

Firas Alghanim: King Hussein School of Computing Sciences, Princess Sumaya University for Technology, Al-Jubaiha, Amman P.O. Box 1438, Jordan
Ibrahim Al-Hurani: Department of Electrical Engineering, Lakehead University, Thunder Bay, ON P7B 5E1, Canada
Hazem Qattous: King Hussein School of Computing Sciences, Princess Sumaya University for Technology, Al-Jubaiha, Amman P.O. Box 1438, Jordan
Abdullah Al-Refai: King Hussein School of Computing Sciences, Princess Sumaya University for Technology, Al-Jubaiha, Amman P.O. Box 1438, Jordan
Osamah Batiha: Department of Biotechnology and Genetic Engineering, Jordan University of Science and Technology, Irbid P.O. Box 3030, Jordan
Abedalrhman Alkhateeb: Computer Science Department, Lakehead University, Thunder Bay, ON P7B 5E1, Canada
Salama Ikki: Department of Electrical Engineering, Lakehead University, Thunder Bay, ON P7B 5E1, Canada

DOI: https://doi.org/10.3390/a17010013
Journal volume & issue: Vol. 17, no. 1
p. 13

Abstract

Read online

Identifying menopause-related breast cancer biomarkers is crucial for enhancing diagnosis, prognosis, and personalized treatment at that stage of the patient’s life. In this paper, we present a comprehensive framework for extracting multiomics biomarkers specifically related to breast cancer incidence before and after menopause. Our approach integrates DNA methylation, gene expression, and copy number alteration data using a systematic pipeline encompassing data preprocessing and handling class imbalance, dimensionality reduction, and classification. The framework starts with MutSigCV for data preprocessing and ensuring data quality. The Synthetic Minority Over-sampling Technique (SMOTE) up-sampling technique is applied to address the class imbalance representation. Then, Principal Component Analysis (PCA) transforms the DNA methylation, gene expression, and copy number alteration data into a latent space. The purpose is to discard irrelevant variations and extract relevant information. Finally, a classification model is built based on the transformed multiomics data into a unified representation. The framework contributes to understanding the complex interplay between menopause and breast cancer, thereby revealing more precise diagnostic and therapeutic strategies in the future. The explainable artificial intelligence model Shapley based on the XGBoost regressor showed the power of the selected gene expressions for predicting the menopause status, and the potential biomarkers included RUNX1, PTEN, MAP3K1, and CDH1. The literature confirmed the findings.

Published in Algorithms

ISSN: 1999-4893 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.mdpi.com/journal/algorithms

About the journal

Abstract

Keywords