Informatics in Medicine Unlocked (Jan 2022)

Identification of blood-based transcriptomics biomarkers for Alzheimer's disease using statistical and machine learning classifier

  • Mohammad Nasir Abdullah,
  • Yap Bee Wah,
  • Abu Bakar Abdul Majeed,
  • Yuslina Zakaria,
  • Norshahida Shaadan

Journal volume & issue
Vol. 33
p. 101083

Abstract

Read online

Alzheimer's disease (AD) is a neurodegenerative disorder that can be characterised by the gradual progression of memory loss, impairment of cognitive function, and progressive disability. This study aims to find the potential transcriptomics biomarkers that elucidate AD patients in Malaysia. The sample involves 92 AD patients and 92 non-AD subjects with 22,254 genes. Boruta's feature selection, a method to reduce the dimensionality of the transcriptomics dataset selected 68 genes. The classification performance of four statistical classifiers and three machine learning (ML) classifiers was evaluated based on sensitivity, precision, accuracy, and F-measure. The F-measure statistic (test set) for elastic net LR (mean = 0.9, sd = 0.05) and random forest (mean = 0.79, sd = 0.06) was found to be the highest as compared to other ML classifiers while naïve Bayes has the lowest F-measure (mean = 0.74, sd = 0.07). The elastic net logistic regression results showed there were 16 (4 novel biomarkers, 7 upregulated biomarkers, and 5 downregulated biomarkers) potential biomarkers for AD patients in Malaysia. The elastic net logistic regression model with 16 transcript genes has 81.59% accuracy and 85.19% sensitivity. The F-measure statistic for this model was 0.8159.

Keywords