Scientific Reports (Oct 2024)

An intelligent learning system based on electronic health records for unbiased stroke prediction

  • Muhammad Asim Saleem,
  • Ashir Javeed,
  • Wasan Akarathanawat,
  • Aurauma Chutinet,
  • Nijasri Charnnarong Suwanwela,
  • Pasu Kaewplung,
  • Surachai Chaitusaney,
  • Sunchai Deelertpaiboon,
  • Wattanasak Srisiri,
  • Watit Benjapolakul

DOI
https://doi.org/10.1038/s41598-024-73570-x
Journal volume & issue
Vol. 14, no. 1
pp. 1 – 14

Abstract

Read online

Abstract Stroke has a negative impact on people’s lives and is one of the leading causes of death and disability worldwide. Early detection of symptoms can significantly help predict stroke and promote a healthy lifestyle. Researchers have developed several methods to predict strokes using machine learning (ML) techniques. However, the proposed systems have suffered from the following two main problems. The first problem is that the machine learning models are biased due to the uneven distribution of classes in the dataset. Recent research has not adequately addressed this problem, and no preventive measures have been taken. Synthetic Minority Oversampling (SMOTE) has been used to remove bias and balance the training of the proposed ML model. The second problem is to solve the problem of lower classification accuracy of machine learning models. We proposed a learning system that combines an autoencoder with a linear discriminant analysis (LDA) model to increase the accuracy of the proposed ML model for stroke prediction. Relevant features are extracted from the feature space using the autoencoder, and the extracted subset is then fed into the LDA model for stroke classification. The hyperparameters of the LDA model are found using a grid search strategy. However, the conventional accuracy metric does not truly reflect the performance of ML models. Therefore, we employed several evaluation metrics to validate the efficiency of the proposed model. Consequently, we evaluated the proposed model’s accuracy, sensitivity, specificity, area under the curve (AUC), and receiver operator characteristic (ROC). The experimental results show that the proposed model achieves a sensitivity and specificity of 98.51% and 97.56%, respectively, with an accuracy of 99.24% and a balanced accuracy of 98.00%.

Keywords