Healthcare Analytics (Nov 2023)

A performance analysis of dimensionality reduction algorithms in machine learning models for cancer prediction

  • Md Faisal Kabir,
  • Tianjie Chen,
  • Simone A. Ludwig

Journal volume & issue
Vol. 3
p. 100125

Abstract

Read online

Developments in technology facilitate the use of machine learning methods in medical fields. In cancer research, the combination of machine learning tools and gene expression data has proven its ability to detect cancer patients. However, processing such high-dimensional and complex data is still a challenge. This paper analyzed the impact different dimensionality reduction techniques have on machine learning models used for cancer prediction. Dimensionality reduction techniques such as principal component analysis (PCA), PCA with a kernel, and autoencoder were utilized to reduce the dimensionality of the RNA sequencing data. Two machine learning classifiers, namely neural network and support vector machine, were trained and tested using the original, dimensionally reduced, and cancer-relevant data. Various metrics, such as accuracy, precision, recall, F-Measure, receiver operating characteristic curve, and area under the curve, were used to assess the performance of classifiers. The results showed that dimensionality reduction positively affects the performance of the classifiers. Additionally, autoencoder performed better than PCA and PCA with a kernal. These findings indicate the potential of dimensionality reduction in improving the analytical results of machine learning classification models on high-dimensional data.

Keywords