Patterns (Jun 2023)

CALANGO: A phylogeny-aware comparative genomics tool for discovering quantitative genotype-phenotype associations across species

  • Jorge Augusto Hongo,
  • Giovanni Marques de Castro,
  • Alison Pelri Albuquerque Menezes,
  • Agnello César Rios Picorelli,
  • Thieres Tayroni Martins da Silva,
  • Eddie Luidy Imada,
  • Luigi Marchionni,
  • Luiz-Eduardo Del-Bem,
  • Anderson Vieira Chaves,
  • Gabriel Magno de Freitas Almeida,
  • Felipe Campelo,
  • Francisco Pereira Lobo

Journal volume & issue
Vol. 4, no. 6
p. 100728

Abstract

Read online

Summary: Living species vary significantly in phenotype and genomic content. Sophisticated statistical methods linking genes with phenotypes within a species have led to breakthroughs in complex genetic diseases and genetic breeding. Despite the abundance of genomic and phenotypic data available for thousands of species, finding genotype-phenotype associations across species is challenging due to the non-independence of species data resulting from common ancestry. To address this, we present CALANGO (comparative analysis with annotation-based genomic components), a phylogeny-aware comparative genomics tool to find homologous regions and biological roles associated with quantitative phenotypes across species. In two case studies, CALANGO identified both known and previously unidentified genotype-phenotype associations. The first study revealed unknown aspects of the ecological interaction between Escherichia coli, its integrated bacteriophages, and the pathogenicity phenotype. The second identified an association between maximum height in angiosperms and the expansion of a reproductive mechanism that prevents inbreeding and increases genetic diversity, with implications for conservation biology and agriculture. The bigger picture: Life is a complex and varied phenomenon with a wide range of phenotypic and genotypic variations. The search for the putative genetic mechanisms associated with—and eventually playing causal roles in—the phenotypic differences between species remains a key question in biology. We introduce CALANGO, a comparative genomics tool to search for genome-wide genotype-phenotype associations across species, taking advantage of the large amounts of phenotypic data available for species with complete genomes. Our tool uses phylogeny-aware linear models to account for the non-independence of species data and can be used to detect both homologous regions and molecular functional convergences associated with phenotypes. Through two case studies, we show how CALANGO can be used to investigate the genomic and functional evolution of distinct complex phenotypes and to select targets for experimental characterization.

Keywords