PLoS Computational Biology (Nov 2020)

Identifying longevity associated genes by integrating gene expression and curated annotations.

  • F William Townes,
  • Kareem Carr,
  • Jeffrey W Miller

DOI
https://doi.org/10.1371/journal.pcbi.1008429
Journal volume & issue
Vol. 16, no. 11
p. e1008429

Abstract

Read online

Aging is a complex process with poorly understood genetic mechanisms. Recent studies have sought to classify genes as pro-longevity or anti-longevity using a variety of machine learning algorithms. However, it is not clear which types of features are best for optimizing classification performance and which algorithms are best suited to this task. Further, performance assessments based on held-out test data are lacking. We systematically compare five popular classification algorithms using gene ontology and gene expression datasets as features to predict the pro-longevity versus anti-longevity status of genes for two model organisms (C. elegans and S. cerevisiae) using the GenAge database as ground truth. We find that elastic net penalized logistic regression performs particularly well at this task. Using elastic net, we make novel predictions of pro- and anti-longevity genes that are not currently in the GenAge database.