Frontiers in Plant Science (May 2019)
Computational Analysis of AmpSeq Data for Targeted, High-Throughput Genotyping of Amplicons
Abstract
Amplicon sequencing (AmpSeq) is a practical, intuitive strategy with a semi-automated computational pipeline for analysis of highly multiplexed PCR-derived sequences. This genotyping platform is particularly cost-effective when multiplexing 96 or more samples with a few amplicons up to thousands of amplicons. Amplicons can target from a single nucleotide to the upper limit of the sequencing platform. The flexibility of AmpSeq’s wet lab methods make it a tool of broad interest for diverse species, and AmpSeq excels in flexibility, high-throughput, low-cost, accuracy, and semi-automated analysis. Here we provide an open science framework procedure to output data out of an AmpSeq project, with an emphasis on the bioinformatics pipeline to generate SNPs, haplotypes and presence/absence variants in a set of diverse genotypes. Open-access tutorial datasets with actual data and a containerization open source software instance are provided to enable training in each of these genotyping applications. The pipelines presented here should be applicable to the analysis of various target-enriched (e.g., amplicon or sequence capture) Illumina sequence data.
Keywords