Genome Biology (Dec 2022)

Structural variant analysis of a cancer reference cell line sample using multiple sequencing technologies

  • Keyur Talsania,
  • Tsai-wei Shen,
  • Xiongfong Chen,
  • Erich Jaeger,
  • Zhipan Li,
  • Zhong Chen,
  • Wanqiu Chen,
  • Bao Tran,
  • Rebecca Kusko,
  • Limin Wang,
  • Andy Wing Chun Pang,
  • Zhaowei Yang,
  • Sulbha Choudhari,
  • Michael Colgan,
  • Li Tai Fang,
  • Andrew Carroll,
  • Jyoti Shetty,
  • Yuliya Kriga,
  • Oksana German,
  • Tatyana Smirnova,
  • Tiantain Liu,
  • Jing Li,
  • Ben Kellman,
  • Karl Hong,
  • Alex R. Hastie,
  • Aparna Natarajan,
  • Ali Moshrefi,
  • Anastasiya Granat,
  • Tiffany Truong,
  • Robin Bombardi,
  • Veronnica Mankinen,
  • Daoud Meerzaman,
  • Christopher E. Mason,
  • Jack Collins,
  • Eric Stahlberg,
  • Chunlin Xiao,
  • Charles Wang,
  • Wenming Xiao,
  • Yongmei Zhao

DOI
https://doi.org/10.1186/s13059-022-02816-6
Journal volume & issue
Vol. 23, no. 1
pp. 1 – 33

Abstract

Read online

Abstract Background The cancer genome is commonly altered with thousands of structural rearrangements including insertions, deletions, translocation, inversions, duplications, and copy number variations. Thus, structural variant (SV) characterization plays a paramount role in cancer target identification, oncology diagnostics, and personalized medicine. As part of the SEQC2 Consortium effort, the present study established and evaluated a consensus SV call set using a breast cancer reference cell line and matched normal control derived from the same donor, which were used in our companion benchmarking studies as reference samples. Results We systematically investigated somatic SVs in the reference cancer cell line by comparing to a matched normal cell line using multiple NGS platforms including Illumina short-read, 10X Genomics linked reads, PacBio long reads, Oxford Nanopore long reads, and high-throughput chromosome conformation capture (Hi-C). We established a consensus SV call set of a total of 1788 SVs including 717 deletions, 230 duplications, 551 insertions, 133 inversions, 146 translocations, and 11 breakends for the reference cancer cell line. To independently evaluate and cross-validate the accuracy of our consensus SV call set, we used orthogonal methods including PCR-based validation, Affymetrix arrays, Bionano optical mapping, and identification of fusion genes detected from RNA-seq. We evaluated the strengths and weaknesses of each NGS technology for SV determination, and our findings provide an actionable guide to improve cancer genome SV detection sensitivity and accuracy. Conclusions A high-confidence consensus SV call set was established for the reference cancer cell line. A large subset of the variants identified was validated by multiple orthogonal methods.

Keywords