PLoS ONE (Jan 2014)

PSCC: sensitive and reliable population-scale copy number variation detection method based on low coverage sequencing.

  • Xuchao Li,
  • Shengpei Chen,
  • Weiwei Xie,
  • Ida Vogel,
  • Kwong Wai Choy,
  • Fang Chen,
  • Rikke Christensen,
  • Chunlei Zhang,
  • Huijuan Ge,
  • Haojun Jiang,
  • Chang Yu,
  • Fang Huang,
  • Wei Wang,
  • Hui Jiang,
  • Xiuqing Zhang

DOI
https://doi.org/10.1371/journal.pone.0085096
Journal volume & issue
Vol. 9, no. 1
p. e85096

Abstract

Read online

BackgroundCopy number variations (CNVs) represent an important type of genetic variation that deeply impact phenotypic polymorphisms and human diseases. The advent of high-throughput sequencing technologies provides an opportunity to revolutionize the discovery of CNVs and to explore their relationship with diseases. However, most of the existing methods depend on sequencing depth and show instability with low sequence coverage. In this study, using low coverage whole-genome sequencing (LCS) we have developed an effective population-scale CNV calling (PSCC) method.Methodology/principal findingsIn our novel method, two-step correction was used to remove biases caused by local GC content and complex genomic characteristics. We chose a binary segmentation method to locate CNV segments and designed combined statistics tests to ensure the stable performance of the false positive control. The simulation data showed that our PSCC method could achieve 99.7%/100% and 98.6%/100% sensitivity and specificity for over 300 kb CNV calling in the condition of LCS (∼2×) and ultra LCS (∼0.2×), respectively. Finally, we applied this novel method to analyze 34 clinical samples with an average of 2× LCS. In the final results, all the 31 pathogenic CNVs identified by aCGH were successfully detected. In addition, the performance comparison revealed that our method had significant advantages over existing methods using ultra LCS.Conclusions/significanceOur study showed that PSCC can sensitively and reliably detect CNVs using low coverage or even ultra-low coverage data through population-scale sequencing.