PLoS ONE (Jan 2015)

Impact of Pre-Analytical Variables on Cancer Targeted Gene Sequencing Efficiency.

  • Luiz H Araujo,
  • Cynthia Timmers,
  • Konstantin Shilo,
  • Weiqiang Zhao,
  • Jianying Zhang,
  • Lianbo Yu,
  • Thanemozhi G Natarajan,
  • Clinton J Miller,
  • Ayse Selen Yilmaz,
  • Tom Liu,
  • Joseph Amann,
  • José Roberto Lapa E Silva,
  • Carlos Gil Ferreira,
  • David P Carbone

DOI
https://doi.org/10.1371/journal.pone.0143092
Journal volume & issue
Vol. 10, no. 11
p. e0143092

Abstract

Read online

Tumor specimens are often preserved as formalin-fixed paraffin-embedded (FFPE) tissue blocks, the most common clinical source for DNA sequencing. Herein, we evaluated the effect of pre-sequencing parameters to guide proper sample selection for targeted gene sequencing. Data from 113 FFPE lung tumor specimens were collected, and targeted gene sequencing was performed. Libraries were constructed using custom probes and were paired-end sequenced on a next generation sequencing platform. A PCR-based quality control (QC) assay was utilized to determine DNA quality, and a ratio was generated in comparison to control DNA. We observed that FFPE storage time, PCR/QC ratio, and DNA input in the library preparation were significantly correlated to most parameters of sequencing efficiency including depth of coverage, alignment rate, insert size, and read quality. A combined score using the three parameters was generated and proved highly accurate to predict sequencing metrics. We also showed wide read count variability within the genome, with worse coverage in regions of low GC content like in KRAS. Sample quality and GC content had independent effects on sequencing depth, and the worst results were observed in regions of low GC content in samples with poor quality. Our data confirm that FFPE samples are a reliable source for targeted gene sequencing in cancer, provided adequate sample quality controls are exercised. Tissue quality should be routinely assessed for pre-analytical factors, and sequencing depth may be limited in genomic regions of low GC content if suboptimal samples are utilized.