IEEE Access (Jan 2024)

Health-Related Data Analysis Using Metaheuristic Optimization and Machine Learning

  • Annisa Darmawahyuni,
  • Siti Nurmaini,
  • Bambang Tutuko,
  • Muhammad Naufal Rachmatullah,
  • Firdaus Firdaus,
  • Ade Iriani Sapitri,
  • Anggun Islami,
  • Jordan Marcelino,
  • Rendy Isdwanta,
  • Muhammad Irfan Karim

DOI
https://doi.org/10.1109/ACCESS.2024.3390008
Journal volume & issue
Vol. 12
pp. 55342 – 55356

Abstract

Read online

Health-related data has a decisive role in disease diagnosis. Collecting relevant information from health-related data in medical records has been facilitated by evaluating the features of the data. Relevant research has shown that outcomes are significantly impacted by the use of feature selection (FS) in different medical domain data. FS provides an analysis of the most significant features to improve classification accuracy. The FS technique aims at minimizing the number of input variables and computational overload to maximize classification performance results. However, identifying the optimal features poses issues due to the high dimensionality of large features and the small sample size of health-related data. The metaheuristics optimization algorithm (MOA) plays an important role in generating the best subset features with exploration and exploitation phases. This study experiments with well-known MOAs and supervised learning from the UC Irvine Machine Learning Repository, PhysioNet, Kent Ridge Bio-Medical Dataset, and MIMIC-III v1.4 Repository with varying feature dimensions. To increase the quality of health-related data, this study proposes missing data imputation based on a deep learning approach, an autoencoder (AE). With AE imputation, the performance results obtain 0.0167 mean squared error (MSE) and 0.129 root mean squared error (RMSE). As a result, MOA shows its excellence in achieving minimal features, but still outstanding performance in low- and high-dimensional data. MOA is successfully applied to varying diverse health-related datasets with low- and high-dimensional data.

Keywords