Ecology and Evolution (Oct 2021)

BASE: A novel workflow to integrate nonubiquitous genes in comparative genomics analyses for selection

  • Giobbe Forni,
  • Angelo Alberto Ruggieri,
  • Giovanni Piccinini,
  • Andrea Luchetti

DOI
https://doi.org/10.1002/ece3.7959
Journal volume & issue
Vol. 11, no. 19
pp. 13029 – 13035

Abstract

Read online

Abstract Inferring the selective forces that orthologous genes underwent across different lineages can help us understand the evolutionary processes that have shaped their extant diversity and the phenotypes they underlie. The most widespread metric to estimate the selection regimes of coding genes—across sites and phylogenies—is the ratio of nonsynonymous to synonymous substitutions (dN/dS, also known as ω). Nowadays, modern sequencing technologies and the large amount of already available sequence data allow the retrieval of thousands of orthologous genes across large numbers of species. Nonetheless, the tools available to explore selection regimes are not designed to automatically process all genes, and their practical usage is often restricted to the single‐copy ones which are found across all species considered (i.e., ubiquitous genes). This approach limits the scale of the analysis to a fraction of single‐copy genes, which can be as low as an order of magnitude in respect to those which are not consistently found in all species considered (i.e., nonubiquitous genes). Here, we present a workflow named BASE that—leveraging the CodeML framework—eases the inference and interpretation of gene selection regimes in the context of comparative genomics. Although a number of bioinformatics tools have already been developed to facilitate this kind of analyses, BASE is the first to be specifically designed to allow the integration of nonubiquitous genes in a straightforward and reproducible manner. The workflow—along with all relevant documentation—is available at github.com/for‐giobbe/BASE.

Keywords