G3: Genes, Genomes, Genetics (Mar 2020)

An Automated Method To Predict Mouse Gene and Protein Sequences Using Variant Data

  • Peter Dornbos,
  • Anooj A. Arkatkar,
  • John J. LaPres

DOI
https://doi.org/10.1534/g3.119.400983
Journal volume & issue
Vol. 10, no. 3
pp. 925 – 932

Abstract

Read online

With recent advances in sequencing technologies, the scientific community has begun to probe the potential genetic bases behind complex phenotypes in humans and model organisms. In many cases, the genomes of genetically distinct strains of model organisms, such as the mouse (Mus musculus), have not been fully sequenced. Here, we report on a tool designed to use single-nucleotide polymorphism (SNP) and insertion-deletion (indel) data to predict gene, mRNA, and protein sequences for up to 36 genetically distinct mouse strains. By automated querying of freely accessible databases through a graphical interface, the software requires no data and little computational experience. As a proof of concept, we predicted the gene and amino acid sequence of the aryl hydrocarbon receptor (Ahr) for all inbred mouse strains of which variant data were currently available through Mouse Genome Project. Predicted sequences were compared with fully sequenced genomes to show that the tool is effective in predicting gene and protein sequences.

Keywords