SoftwareX (Jan 2018)

SimPrily: A Python framework to simplify high-throughput genomic simulations

  • Ariella L. Gladstein,
  • Consuelo D. Quinto-Cortés,
  • Julian L. Pistorius,
  • David Christy,
  • Logan Gantner,
  • Blake L. Joyce

Journal volume & issue
Vol. 7
pp. 335 – 340

Abstract

Read online

Genomic simulations are an important technique used in population genetics to infer demographic history, test for regions under selection, and create datasets to validate software. However, running thousands of simulations and manipulating large loci can present computational challenges. We present SimPrily, a Python tool optimized for high throughput computing (HTC), which facilitates simulation of whole chromosomes. SimPrily can use prior distributions of parameters to run simulations, incorporate single nucleotide polymorphism array ascertainment bias into the simulated model, and calculate a variety of genomic summary statistics. We include with SimPrily high-throughput workflows that leverage free computing resources through the Open Science Grid and CyVerse Discovery Environment, allowing researchers to run thousands or millions of large-locus simulations with minimal or no prior command line knowledge. Keywords: Genomics, Coalescent simulation, High-throughput computing, Demographic history