Non‐homology‐based prediction of gene functions in maize (Zea mays ssp. mays)

Xiuru Dai; Zheng Xu; Zhikai Liang; Xiaoyu Tu; Silin Zhong; James C. Schnable; Pinghua Li

doi:10.1002/tpg2.20015

The Plant Genome (Jul 2020)

Non‐homology‐based prediction of gene functions in maize (Zea mays ssp. mays)

Xiuru Dai,
Zheng Xu,
Zhikai Liang,
Xiaoyu Tu,
Silin Zhong,
James C. Schnable,
Pinghua Li

Affiliations

Xiuru Dai: State Key Laboratory of Crop Biology Shandong Agricultural University Taian 273100 China
Zheng Xu: Department of Mathematics and Statistics Wright State University Dayton OH 45435 USA
Zhikai Liang: Quantitative Life Sciences Initiative, Center for Plant Science Innovation, and Department of Agronomy and Horticulture University of Nebraska‐Lincoln Lincoln NE 68588 USA
Xiaoyu Tu: State Key Laboratory of Agrobiotechnology, School of Life Sciences Chinese University of Hong Kong Hong Kong China
Silin Zhong: State Key Laboratory of Agrobiotechnology, School of Life Sciences Chinese University of Hong Kong Hong Kong China
James C. Schnable: Quantitative Life Sciences Initiative, Center for Plant Science Innovation, and Department of Agronomy and Horticulture University of Nebraska‐Lincoln Lincoln NE 68588 USA
Pinghua Li: State Key Laboratory of Crop Biology Shandong Agricultural University Taian 273100 China

DOI: https://doi.org/10.1002/tpg2.20015
Journal volume & issue: Vol. 13, no. 2
pp. n/a – n/a

Abstract

Read online

Abstract Advances in genome sequencing and annotation have eased the difficulty of identifying new gene sequences. Predicting the functions of these newly identified genes remains challenging. Genes descended from a common ancestral sequence are likely to have common functions. As a result, homology is widely used for gene function prediction. This means functional annotation errors also propagate from one species to another. Several approaches based on machine learning classification algorithms were evaluated for their ability to accurately predict gene function from non‐homology gene features. Among the eight supervised classification algorithms evaluated, random‐forest‐based prediction consistently provided the most accurate gene function prediction. Non‐homology‐based functional annotation provides complementary strengths to homology‐based annotation, with higher average performance in Biological Process GO terms, the domain where homology‐based functional annotation performs the worst, and weaker performance in Molecular Function GO terms, the domain where the accuracy of homology‐based functional annotation is highest. GO prediction models trained with homology‐based annotations were able to successfully predict annotations from a manually curated “gold standard” GO annotation set. Non‐homology‐based functional annotation based on machine learning may ultimately prove useful both as a method to assign predicted functions to orphan genes which lack functionally characterized homologs, and to identify and correct functional annotation errors which were propagated through homology‐based functional annotations.

Published in The Plant Genome

ISSN: 1940-3372 (Online)
Publisher: Wiley
Country of publisher: United States
LCC subjects: Agriculture: Plant culture; Science: Biology (General): Genetics
Website: https://acsess.onlinelibrary.wiley.com/journal/19403372

About the journal