A performance analysis of dimensionality reduction algorithms in machine learning models for cancer prediction

Md Faisal Kabir; Tianjie Chen; Simone A. Ludwig

Healthcare Analytics (Nov 2023)

A performance analysis of dimensionality reduction algorithms in machine learning models for cancer prediction

Md Faisal Kabir,
Tianjie Chen,
Simone A. Ludwig

Affiliations

Md Faisal Kabir: Department of Computer Science, Pennsylvania State University Harrisburg, 777 W Harrisburg Pike, Middletown, 17057, PA, USA; Corresponding author.
Tianjie Chen: Department of Computer Science, Pennsylvania State University Harrisburg, 777 W Harrisburg Pike, Middletown, 17057, PA, USA
Simone A. Ludwig: Department of Computer Science, North Dakota State University, 1340 Administration Ave, Fargo, 58105, ND, USA

Journal volume & issue: Vol. 3
p. 100125

Abstract

Read online

Developments in technology facilitate the use of machine learning methods in medical fields. In cancer research, the combination of machine learning tools and gene expression data has proven its ability to detect cancer patients. However, processing such high-dimensional and complex data is still a challenge. This paper analyzed the impact different dimensionality reduction techniques have on machine learning models used for cancer prediction. Dimensionality reduction techniques such as principal component analysis (PCA), PCA with a kernel, and autoencoder were utilized to reduce the dimensionality of the RNA sequencing data. Two machine learning classifiers, namely neural network and support vector machine, were trained and tested using the original, dimensionally reduced, and cancer-relevant data. Various metrics, such as accuracy, precision, recall, F-Measure, receiver operating characteristic curve, and area under the curve, were used to assess the performance of classifiers. The results showed that dimensionality reduction positively affects the performance of the classifiers. Additionally, autoencoder performed better than PCA and PCA with a kernal. These findings indicate the potential of dimensionality reduction in improving the analytical results of machine learning classification models on high-dimensional data.

Published in Healthcare Analytics

ISSN: 2772-4425 (Online)
Publisher: Elsevier
Country of publisher: United States
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: https://www.journals.elsevier.com/healthcare-analytics

About the journal

Abstract

Keywords