CALANGO: A phylogeny-aware comparative genomics tool for discovering quantitative genotype-phenotype associations across species
Jorge Augusto Hongo,
Giovanni Marques de Castro,
Alison Pelri Albuquerque Menezes,
Agnello César Rios Picorelli,
Thieres Tayroni Martins da Silva,
Eddie Luidy Imada,
Luigi Marchionni,
Luiz-Eduardo Del-Bem,
Anderson Vieira Chaves,
Gabriel Magno de Freitas Almeida,
Felipe Campelo,
Francisco Pereira Lobo
Affiliations
Jorge Augusto Hongo
Instituto de Computação, Universidade Estadual de Campinas, Campinas, Sao Paulo 13083-872, Brazil
Giovanni Marques de Castro
Department of Genetics, Ecology and Evolution, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais 31270-901, Brazil
Alison Pelri Albuquerque Menezes
Department of Genetics, Ecology and Evolution, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais 31270-901, Brazil
Agnello César Rios Picorelli
Department of Genetics, Ecology and Evolution, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais 31270-901, Brazil
Thieres Tayroni Martins da Silva
Department of Genetics, Ecology and Evolution, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais 31270-901, Brazil
Eddie Luidy Imada
Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA
Luigi Marchionni
Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10021, USA
Luiz-Eduardo Del-Bem
Department of Botany, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais 31270-901, Brazil
Anderson Vieira Chaves
Department of Genetics, Ecology and Evolution, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais 31270-901, Brazil
Gabriel Magno de Freitas Almeida
Faculty of Biosciences, Fisheries and Economics, Norwegian College of Fishery Science, UiT The Arctic University of Norway, 9019 Tromsø, Norway
Felipe Campelo
Department of Computer Science, College of Engineering and Physical Sciences, Aston University, Birmingham B4 7ET, UK
Francisco Pereira Lobo
Department of Genetics, Ecology and Evolution, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais 31270-901, Brazil; Corresponding author
Summary: Living species vary significantly in phenotype and genomic content. Sophisticated statistical methods linking genes with phenotypes within a species have led to breakthroughs in complex genetic diseases and genetic breeding. Despite the abundance of genomic and phenotypic data available for thousands of species, finding genotype-phenotype associations across species is challenging due to the non-independence of species data resulting from common ancestry. To address this, we present CALANGO (comparative analysis with annotation-based genomic components), a phylogeny-aware comparative genomics tool to find homologous regions and biological roles associated with quantitative phenotypes across species. In two case studies, CALANGO identified both known and previously unidentified genotype-phenotype associations. The first study revealed unknown aspects of the ecological interaction between Escherichia coli, its integrated bacteriophages, and the pathogenicity phenotype. The second identified an association between maximum height in angiosperms and the expansion of a reproductive mechanism that prevents inbreeding and increases genetic diversity, with implications for conservation biology and agriculture. The bigger picture: Life is a complex and varied phenomenon with a wide range of phenotypic and genotypic variations. The search for the putative genetic mechanisms associated with—and eventually playing causal roles in—the phenotypic differences between species remains a key question in biology. We introduce CALANGO, a comparative genomics tool to search for genome-wide genotype-phenotype associations across species, taking advantage of the large amounts of phenotypic data available for species with complete genomes. Our tool uses phylogeny-aware linear models to account for the non-independence of species data and can be used to detect both homologous regions and molecular functional convergences associated with phenotypes. Through two case studies, we show how CALANGO can be used to investigate the genomic and functional evolution of distinct complex phenotypes and to select targets for experimental characterization.