PLoS ONE (Jan 2014)

"Deep" sequencing accuracy and reproducibility using Roche/454 technology for inferring co-receptor usage in HIV-1.

  • David J H F Knapp,
  • Rachel A McGovern,
  • Art F Y Poon,
  • Xiaoyin Zhong,
  • Dennison Chan,
  • Luke C Swenson,
  • Winnie Dong,
  • P Richard Harrigan

DOI
https://doi.org/10.1371/journal.pone.0099508
Journal volume & issue
Vol. 9, no. 6
p. e99508

Abstract

Read online

Next generation, "deep", sequencing has increasing applications both clinically and in disparate fields of research. This study investigates the accuracy and reproducibility of "deep" sequencing as applied to co-receptor prediction using the V3 loop of Human Immunodeficiency Virus-1. Despite increasing use in HIV co-receptor prediction, the accuracy and reproducibility of deep sequencing technology, and the factors which can affect it, have received only a limited level of investigation. To accomplish this, repeated deep sequencing results were generated using the Roche GS-FLX (454) from a number of sources including a non-homogeneous clinical sample (N = 47 replicates over 18 deep sequencing runs), and a large clinical cohort from the MOTIVATE and A400129 studies (N = 1521). For repeated measurements of a non-homogeneous clinical sample, increasing input copy number both decreased variance in the measured proportion of non-R5 using virus (p<<0.001 and 0.02 for single replicates and triplicates respectively) and increased measured viral diversity (p<0.001; multiple measures). Detection of sequences with a mean abundance less than 1% abundance showed a 2 fold increase in median coefficient of variation (CV) in repeated measurements of a non-homogeneous clinical sample, and a 2.7 fold increase in CV in the MOTIVATE/A400129 dataset compared to sequences with ≥1% abundance. An unexpected source of error included read position, with low accuracy reads occurring more frequently towards the edge of sequencing regions (p<<0.001). Overall, the primary source of variability was sampling error caused by low input copy number/minority species prevalence, though other sources of error including sequence intrinsic, temporal, and read-position related errors were detected.