PLoS ONE (Jan 2017)

The impact of RNA secondary structure on read start locations on the Illumina sequencing platform.

  • Adam Price,
  • Jaishree Garhyan,
  • Cynthia Gibas

DOI
https://doi.org/10.1371/journal.pone.0173023
Journal volume & issue
Vol. 12, no. 2
p. e0173023

Abstract

Read online

High-throughput sequencing is subject to sequence dependent bias, which must be accounted for if researchers are to make precise measurements and draw accurate conclusions from their data. A widely studied source of bias in sequencing is the GC content bias, in which levels of GC content in a genomic region effect the number of reads produced during sequencing. Although some research has been performed on methods to correct for GC bias, there has been little effort to understand the underlying mechanism. The availability of sequencing protocols that target the specific location of structure in nucleic acid molecules enables us to investigate the underlying molecular origin of observed GC bias in sequencing. By applying a parallel analysis of RNA structure (PARS) protocol to bacterial genomes of varying GC content, we are able to observe the relationship between local RNA secondary structure and sequencing outcome, and to establish RNA secondary structure as the significant contributing factor to observed GC bias.