Mathematics (Jun 2022)

Unsupervised Learning for Feature Representation Using Spatial Distribution of Amino Acids in Aldehyde Dehydrogenase (ALDH2) Protein Sequences

  • Monika Khandelwal,
  • Sabha Sheikh,
  • Ranjeet Kumar Rout,
  • Saiyed Umer,
  • Saurav Mallik,
  • Zhongming Zhao

DOI
https://doi.org/10.3390/math10132228
Journal volume & issue
Vol. 10, no. 13
p. 2228

Abstract

Read online

Aldehyde dehydrogenase 2 (ALDH2) enzyme is required for alcohol detoxification. ALDH2 belongs to the aldehyde dehydrogenase family, the most important oxidative pathway of alcohol digestion. Two main liver isoforms of aldehyde dehydrogenase are cytosolic and mitochondrial. Approximately 50% of East Asians have ALDH2 deficiency (inactive mitochondrial isozyme), with lysine (K) for glutamate (E) substitution at position 487 (E487K). ALDH2 deficiency is also known as Alcohol Flushing Syndrome or Asian Glow. For people with an ALDH2 deficiency, their face turns red after drinking alcohol, and they are more susceptible to various diseases than ALDH2-normal people. This study performed a machine learning analysis of ALDH2 sequences of thirteen other species by comparing them with the human ALDH2 sequence. Based on the various quantitative metrics (physicochemical properties, secondary structure, Hurst exponent, Shannon entropy, and fractal dimension), these fourteen species were clustered into four clusters using the unsupervised machine learning (K-means clustering) algorithm. We also analyze these species using hierarchical clustering (agglomerative clustering) and draw the phylogenetic trees. The results show that Homo sapiens is more closely related to the Bos taurus and Sus scrofa species. Our experimental results suggest that the testing for discovering medicines may be done on these species before being tested in humans to alleviate the impacts of ALDH2 deficiency.

Keywords