Improving prediction of blood cancer using leukemia microarray gene data and Chi2 features with weighted convolutional neural network

Ebtisam Abdullah Alabdulqader; Aisha Ahmed Alarfaj; Muhammad Umer; Ala’ Abdulmajid Eshmawi; Shtwai Alsubai; Tai-hoon Kim; Imran Ashraf

doi:10.1038/s41598-024-65315-7

Scientific Reports (Jul 2024)

Improving prediction of blood cancer using leukemia microarray gene data and Chi2 features with weighted convolutional neural network

Ebtisam Abdullah Alabdulqader,
Aisha Ahmed Alarfaj,
Muhammad Umer,
Ala’ Abdulmajid Eshmawi,
Shtwai Alsubai,
Tai-hoon Kim,
Imran Ashraf

Affiliations

Ebtisam Abdullah Alabdulqader: Department of Information Technology, College of Computer and Information Sciences, King Saud University
Aisha Ahmed Alarfaj: Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University
Muhammad Umer: Department of Computer Science and Information Technology, The Islamia University of Bahawalpur
Ala’ Abdulmajid Eshmawi: Department of Cybersecurity, College of Computer Science and Engineering, University of Jeddah
Shtwai Alsubai: Department of Computer Science, College of Computer Engineering and Sciences, Prince Sattam bin Abdulaziz University
Tai-hoon Kim: School of Electrical and Computer Engineering, Yeosu Campus, Chonnam National University
Imran Ashraf: Department of Information and Communication Engineering, Yeungnam University

DOI: https://doi.org/10.1038/s41598-024-65315-7
Journal volume & issue: Vol. 14, no. 1
pp. 1 – 15

Abstract

Read online

Abstract Blood cancer has emerged as a growing concern over the past decade, necessitating early diagnosis for timely and effective treatment. The present diagnostic method, which involves a battery of tests and medical experts, is costly and time-consuming. For this reason, it is crucial to establish an automated diagnostic system for accurate predictions. A particular field of focus in medical research is the use of machine learning and leukemia microarray gene data for blood cancer diagnosis. Even with a great deal of research, more improvements are needed to reach the appropriate levels of accuracy and efficacy. This work presents a supervised machine-learning algorithm for blood cancer prediction. This work makes use of the 22,283-gene leukemia microarray gene data. Chi-squared (Chi2) feature selection methods and the synthetic minority oversampling technique (SMOTE)-Tomek resampling is used to overcome issues with imbalanced and high-dimensional datasets. To balance the dataset for each target class, SMOTE-Tomek creates synthetic data, and Chi2 chooses the most important features to train the learning models from 22,283 genes. A novel weighted convolutional neural network (CNN) model is proposed for classification, utilizing the support of three separate CNN models. To determine the importance of the proposed approach, extensive experiments are carried out on the datasets, including a performance comparison with the most advanced techniques. Weighted CNN demonstrates superior performance over other models when coupled with SMOTE-Tomek and Chi2 techniques, achieving a remarkable 99.9% accuracy. Results from k-fold cross-validation further affirm the supremacy of the proposed model.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal