Genome Biology (Dec 2022)
Structural variant analysis of a cancer reference cell line sample using multiple sequencing technologies
- Keyur Talsania,
- Tsai-wei Shen,
- Xiongfong Chen,
- Erich Jaeger,
- Zhipan Li,
- Zhong Chen,
- Wanqiu Chen,
- Bao Tran,
- Rebecca Kusko,
- Limin Wang,
- Andy Wing Chun Pang,
- Zhaowei Yang,
- Sulbha Choudhari,
- Michael Colgan,
- Li Tai Fang,
- Andrew Carroll,
- Jyoti Shetty,
- Yuliya Kriga,
- Oksana German,
- Tatyana Smirnova,
- Tiantain Liu,
- Jing Li,
- Ben Kellman,
- Karl Hong,
- Alex R. Hastie,
- Aparna Natarajan,
- Ali Moshrefi,
- Anastasiya Granat,
- Tiffany Truong,
- Robin Bombardi,
- Veronnica Mankinen,
- Daoud Meerzaman,
- Christopher E. Mason,
- Jack Collins,
- Eric Stahlberg,
- Chunlin Xiao,
- Charles Wang,
- Wenming Xiao,
- Yongmei Zhao
Affiliations
- Keyur Talsania
- Sequencing Facility Bioinformatics Group, Advanced Biomedical and Computational Science, Frederick National Laboratory for Cancer Research
- Tsai-wei Shen
- Sequencing Facility Bioinformatics Group, Advanced Biomedical and Computational Science, Frederick National Laboratory for Cancer Research
- Xiongfong Chen
- Sequencing Facility Bioinformatics Group, Advanced Biomedical and Computational Science, Frederick National Laboratory for Cancer Research
- Erich Jaeger
- Illumina Inc
- Zhipan Li
- Sentieon Inc
- Zhong Chen
- Center for Genomics, Loma Linda University School of Medicine
- Wanqiu Chen
- Center for Genomics, Loma Linda University School of Medicine
- Bao Tran
- Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research
- Rebecca Kusko
- Immuneering Corp
- Limin Wang
- Laboratory of Human Carcinogenesis, Center for Cancer Research, National Cancer Institute
- Andy Wing Chun Pang
- Bionano Genomics
- Zhaowei Yang
- Department of Allergy and Clinical Immunology, State Key Laboratory of Respiratory Disease, Guangzhou Institute of Respiratory Health, the First Affiliated Hospital of Guangzhou Medical University
- Sulbha Choudhari
- Sequencing Facility Bioinformatics Group, Advanced Biomedical and Computational Science, Frederick National Laboratory for Cancer Research
- Michael Colgan
- Center for Drug Evaluation and Research, FDA
- Li Tai Fang
- Bioinformatics Research & Early Development, Roche Sequencing Solutions Inc
- Andrew Carroll
- DNAnexus
- Jyoti Shetty
- Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research
- Yuliya Kriga
- Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research
- Oksana German
- Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research
- Tatyana Smirnova
- Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research
- Tiantain Liu
- Center for Genomics, Loma Linda University School of Medicine
- Jing Li
- Department of Allergy and Clinical Immunology, State Key Laboratory of Respiratory Disease, Guangzhou Institute of Respiratory Health, the First Affiliated Hospital of Guangzhou Medical University
- Ben Kellman
- Bionano Genomics
- Karl Hong
- Bionano Genomics
- Alex R. Hastie
- Bionano Genomics
- Aparna Natarajan
- Illumina Inc
- Ali Moshrefi
- Illumina Inc
- Anastasiya Granat
- Illumina Inc
- Tiffany Truong
- Illumina Inc
- Robin Bombardi
- Illumina Inc
- Veronnica Mankinen
- Dovetail Genomics
- Daoud Meerzaman
- Computational Genomics and Bioinformatics Branch, Center for Biomedical Informatics and Information Technology (CBIIT), National Cancer Institute
- Christopher E. Mason
- Department of Physiology and Biophysics, Weill Cornell Medicine
- Jack Collins
- Sequencing Facility Bioinformatics Group, Advanced Biomedical and Computational Science, Frederick National Laboratory for Cancer Research
- Eric Stahlberg
- Bioinformatics and Computational Science Directorate, Frederick National Laboratory for Cancer Research
- Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health
- Charles Wang
- Center for Genomics, Loma Linda University School of Medicine
- Wenming Xiao
- Center for Drug Evaluation and Research, FDA
- Yongmei Zhao
- Sequencing Facility Bioinformatics Group, Advanced Biomedical and Computational Science, Frederick National Laboratory for Cancer Research
- DOI
- https://doi.org/10.1186/s13059-022-02816-6
- Journal volume & issue
-
Vol. 23,
no. 1
pp. 1 – 33
Abstract
Abstract Background The cancer genome is commonly altered with thousands of structural rearrangements including insertions, deletions, translocation, inversions, duplications, and copy number variations. Thus, structural variant (SV) characterization plays a paramount role in cancer target identification, oncology diagnostics, and personalized medicine. As part of the SEQC2 Consortium effort, the present study established and evaluated a consensus SV call set using a breast cancer reference cell line and matched normal control derived from the same donor, which were used in our companion benchmarking studies as reference samples. Results We systematically investigated somatic SVs in the reference cancer cell line by comparing to a matched normal cell line using multiple NGS platforms including Illumina short-read, 10X Genomics linked reads, PacBio long reads, Oxford Nanopore long reads, and high-throughput chromosome conformation capture (Hi-C). We established a consensus SV call set of a total of 1788 SVs including 717 deletions, 230 duplications, 551 insertions, 133 inversions, 146 translocations, and 11 breakends for the reference cancer cell line. To independently evaluate and cross-validate the accuracy of our consensus SV call set, we used orthogonal methods including PCR-based validation, Affymetrix arrays, Bionano optical mapping, and identification of fusion genes detected from RNA-seq. We evaluated the strengths and weaknesses of each NGS technology for SV determination, and our findings provide an actionable guide to improve cancer genome SV detection sensitivity and accuracy. Conclusions A high-confidence consensus SV call set was established for the reference cancer cell line. A large subset of the variants identified was validated by multiple orthogonal methods.
Keywords
- Structural variation
- Reference call set
- Cancer
- Multiple platforms
- Structural variant calling algorithm
- Next-generation sequencing technology