The Plant Genome (Jul 2009)

Large-Scale Discovery of Gene-Enriched SNPs

  • Michael A. Gore,
  • Mark H. Wright,
  • Elhan S. Ersoz,
  • Pascal Bouffard,
  • Edward S. Szekeres,
  • Thomas P. Jarvie,
  • Bonnie L. Hurwitz,
  • Apurva Narechania,
  • Timothy T. Harkins,
  • George S. Grills,
  • Doreen H. Ware,
  • Edward S. Buckler

DOI
https://doi.org/10.3835/plantgenome2009.01.0002
Journal volume & issue
Vol. 2, no. 2
pp. 121 – 133

Abstract

Read online

Whole-genome association studies of complex traits in higher eukaryotes require a high density of single nucleotide polymorphism (SNP) markers at genome-wide coverage. To design high-throughput, multiplexed SNP genotyping assays, researchers must first discover large numbers of SNPs by extensively resequencing multiple individuals or lines. For SNP discovery approaches using short read-lengths that next-generation DNA sequencing technologies offer, the highly repetitive and duplicated nature of large plant genomes presents additional challenges. Here, we describe a genomic library construction procedure that facilitates pyrosequencing of genic and low-copy regions in plant genomes, and a customized computational pipeline to analyze and assemble short reads (100–200 bp), identify allelic reference sequence comparisons, and call SNPs with a high degree of accuracy. With maize ( L.) as the test organism in a pilot experiment, the implementation of these methods resulted in the identification of 126,683 putative SNPs between two maize inbred lines at an estimated false discovery rate (FDR) of 15.1%. We estimated rates of false SNP discovery using an internal control, and we validated these FDR rates with an external SNP dataset that was generated using locus-specific PCR amplification and Sanger sequencing. These results show that this approach has wide applicability for efficiently and accurately detecting gene-enriched SNPs in large, complex plant genomes.