Journal of Big Data (Jun 2022)

Machine learning model for malaria risk prediction based on mutation location of large-scale genetic variation data

  • Kah Yee Tai,
  • Jasbir Dhaliwal

DOI
https://doi.org/10.1186/s40537-022-00635-x
Journal volume & issue
Vol. 9, no. 1
pp. 1 – 22

Abstract

Read online

Abstract In recent malaria research, the complexity of the disease has been explored using machine learning models via blood smear images, environmental, and even RNA-Seq data. However, a machine learning model based on genetic variation data is still required to fully explore individual malaria risk. Furthermore, many Genome-Wide Associations Studies (GWAS) have associated specific genetic markers, i.e., single nucleotide polymorphisms (SNPs), with malaria. Thus, the present study improves the current state-of-the-art genetic risk score by incorporating SNPs mutation location on large-scale genetic variation data obtained from GWAS. Nevertheless, it becomes computationally expensive for hyperparameter optimization on large-scale datasets. Therefore, this study proposes a machine learning model that incorporates mutation location as well as a Genetic Algorithm (GA) to optimize hyperparameters. Besides that, a deep learning model is also proposed to predict individual malaria risk as an alternative approach. The analysis is performed on the Malaria Genomic Epidemiology Network (MalariaGEN) dataset comprising 20,817 individuals from 11 populations. The findings of this study demonstrated that the proposed GA could overcome the curse of dimensionality and improve resource efficiency compared to commonly used methods. In addition, incorporating the mutation location significantly improved the machine learning models in predicting the individual malaria risk; a Mean Absolute Error (MAE) score of 8.00E−06. Moreover, the deep learning model obtained almost similar MAE scores to the machine learning models, indicating an alternative approach. Thus, this study provides relevant knowledge of genetic and technical deliberations that can improve the state-of-the-art methods for predicting individual malaria risk.

Keywords