Genome-wide prediction and prioritization of human aging genes by data fusion: a machine learning approach

Masoud Arabfard; Mina Ohadi; Vahid Rezaei Tabar; Ahmad Delbari; Kaveh Kavousi

doi:10.1186/s12864-019-6140-0

BMC Genomics (Nov 2019)

Genome-wide prediction and prioritization of human aging genes by data fusion: a machine learning approach

Masoud Arabfard,
Mina Ohadi,
Vahid Rezaei Tabar,
Ahmad Delbari,
Kaveh Kavousi

Affiliations

Masoud Arabfard: Department of Bioinformatics, Kish International Campus University of Tehran
Mina Ohadi: Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences
Vahid Rezaei Tabar: Department of Statistics, Faculty of Mathematical Sciences and Computer, Allameh Tabataba’i University
Ahmad Delbari: Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences
Kaveh Kavousi: Laboratory of Complex Biological Systems and Bioinformatics (CBB), Department of Bioinformatics, Institute of Biochemistry and Biophysics (IBB), University of Tehran

DOI: https://doi.org/10.1186/s12864-019-6140-0
Journal volume & issue: Vol. 20, no. 1
pp. 1 – 13

Abstract

Read online

Abstract Background Machine learning can effectively nominate novel genes for various research purposes in the laboratory. On a genome-wide scale, we implemented multiple databases and algorithms to predict and prioritize the human aging genes (PPHAGE). Results We fused data from 11 databases, and used Naïve Bayes classifier and positive unlabeled learning (PUL) methods, NB, Spy, and Rocchio-SVM, to rank human genes in respect with their implication in aging. The PUL methods enabled us to identify a list of negative (non-aging) genes to use alongside the seed (known age-related) genes in the ranking process. Comparison of the PUL algorithms revealed that none of the methods for identifying a negative sample were advantageous over other methods, and their simultaneous use in a form of fusion was critical for obtaining optimal results (PPHAGE is publicly available at https://cbb.ut.ac.ir/pphage). Conclusion We predict and prioritize over 3,000 candidate age-related genes in human, based on significant ranking scores. The identified candidate genes are associated with pathways, ontologies, and diseases that are linked to aging, such as cancer and diabetes. Our data offer a platform for future experimental research on the genetic and biological aspects of aging. Additionally, we demonstrate that fusion of PUL methods and data sources can be successfully used for aging and disease candidate gene prioritization.

Published in BMC Genomics

ISSN: 1471-2164 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Technology: Chemical technology: Biotechnology; Science: Biology (General): Genetics
Website: http://bmcgenomics.biomedcentral.com

About the journal

Abstract

Keywords