Scientific Reports (Mar 2024)
Extend the benchmarking indel set by manual review using the individual cell line sequencing data from the Sequencing Quality Control 2 (SEQC2) project
- Binsheng Gong,
- Dan Li,
- Yifan Zhang,
- Rebecca Kusko,
- Samir Lababidi,
- Zehui Cao,
- Mingyang Chen,
- Ning Chen,
- Qiaochu Chen,
- Qingwang Chen,
- Jiacheng Dai,
- Qiang Gan,
- Yuechen Gao,
- Mingkun Guo,
- Gunjan Hariani,
- Yujie He,
- Wanwan Hou,
- He Jiang,
- Garima Kushwaha,
- Jian-Liang Li,
- Jianying Li,
- Yulan Li,
- Liang-Chun Liu,
- Ruimei Liu,
- Shiming Liu,
- Edwin Meriaux,
- Mengqing Mo,
- Mathew Moore,
- Tyler J. Moss,
- Quanne Niu,
- Ananddeep Patel,
- Luyao Ren,
- Nedda F. Saremi,
- Erfei Shang,
- Jun Shang,
- Ping Song,
- Siqi Sun,
- Brent J. Urban,
- Danke Wang,
- Shangzi Wang,
- Zhining Wen,
- Xiangyi Xiong,
- Jingcheng Yang,
- Lihui Yin,
- Chao Zhang,
- Ruolan Zhang,
- Ambica Bhandari,
- Wanshi Cai,
- Agda Karina Eterovic,
- Dalila B. Megherbi,
- Tieliu Shi,
- Chen Suo,
- Ying Yu,
- Yuanting Zheng,
- Natalia Novoradovskaya,
- Renee L. Sears,
- Leming Shi,
- Wendell Jones,
- Weida Tong,
- Joshua Xu
Affiliations
- Binsheng Gong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration
- Dan Li
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration
- Yifan Zhang
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration
- Rebecca Kusko
- Cellino Bio
- Samir Lababidi
- Office of Data Analytics and Research, Office of Digital Transformation, Office of the Commissioner, U.S. Food and Drug Administration
- Zehui Cao
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University
- Mingyang Chen
- Human Phenome Institute, Fudan University
- Ning Chen
- iGeneTech Bioscience Co., Ltd.
- Qiaochu Chen
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University
- Qingwang Chen
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University
- Jiacheng Dai
- Human Phenome Institute, Fudan University
- Qiang Gan
- Clinical Diagnostics Division, Thermo Fisher Scientific
- Yuechen Gao
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University
- Mingkun Guo
- College of Chemistry, Sichuan University
- Gunjan Hariani
- Q squared Solutions Genomics
- Yujie He
- College of Chemistry, Sichuan University
- Wanwan Hou
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University
- He Jiang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University
- Garima Kushwaha
- Guardant Health, Inc.
- Jian-Liang Li
- Integrative Bioinformatics Support Group, National Institute of Environmental Health Sciences, National Institutes of Health
- Jianying Li
- Integrative Bioinformatics Support Group, National Institute of Environmental Health Sciences, National Institutes of Health
- Yulan Li
- College of Life Sciences, Shanghai Normal University
- Liang-Chun Liu
- Clinical Diagnostics Division, Thermo Fisher Scientific
- Ruimei Liu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University
- Shiming Liu
- Center for Bioinformatics and Computational Biology, and the Institute of Biomedical Sciences, School of Life Sciences, East China Normal University
- Edwin Meriaux
- CMINDS Research Center, University of Massachusetts
- Mengqing Mo
- Department of Epidemiology, School of Public Health, Fudan University
- Mathew Moore
- ResearchDx
- Tyler J. Moss
- Eurofins Viracor, LLC
- Quanne Niu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University
- Ananddeep Patel
- Eurofins Viracor Biopharma Services, Inc.
- Luyao Ren
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University
- Nedda F. Saremi
- Agilent Technologies, Inc.
- Erfei Shang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University
- Jun Shang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University
- Ping Song
- Cancer Genomics Laboratory, Department of Genomic Medicine, MD Anderson Cancer Center
- Siqi Sun
- ResearchDx
- Brent J. Urban
- Eurofins Viracor Biopharma Services, Inc.
- Danke Wang
- Human Phenome Institute, Fudan University
- Shangzi Wang
- State Key Laboratory of Genetic Engineering and Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University
- Zhining Wen
- College of Chemistry, Sichuan University
- Xiangyi Xiong
- College of Life Sciences, Shanghai Normal University
- Jingcheng Yang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University
- Lihui Yin
- PathGroup
- Chao Zhang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University
- Ruolan Zhang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University
- Ambica Bhandari
- ResearchDx
- Wanshi Cai
- iGeneTech Bioscience Co., Ltd.
- Agda Karina Eterovic
- Eurofins Viracor Biopharma Services, Inc.
- Dalila B. Megherbi
- CMINDS Research Center, University of Massachusetts
- Tieliu Shi
- Center for Bioinformatics and Computational Biology, and the Institute of Biomedical Sciences, School of Life Sciences, East China Normal University
- Chen Suo
- Department of Epidemiology, School of Public Health, Fudan University
- Ying Yu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University
- Yuanting Zheng
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University
- Natalia Novoradovskaya
- Agilent Technologies, Inc.
- Renee L. Sears
- Velsera
- Leming Shi
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Human Phenome Institute, Fudan University
- Wendell Jones
- Q squared Solutions Genomics
- Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration
- Joshua Xu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration
- DOI
- https://doi.org/10.1038/s41598-024-57439-7
- Journal volume & issue
-
Vol. 14,
no. 1
pp. 1 – 9
Abstract
Abstract Accurate indel calling plays an important role in precision medicine. A benchmarking indel set is essential for thoroughly evaluating the indel calling performance of bioinformatics pipelines. A reference sample with a set of known-positive variants was developed in the FDA-led Sequencing Quality Control Phase 2 (SEQC2) project, but the known indels in the known-positive set were limited. This project sought to provide an enriched set of known indels that would be more translationally relevant by focusing on additional cancer related regions. A thorough manual review process completed by 42 reviewers, two advisors, and a judging panel of three researchers significantly enriched the known indel set by an additional 516 indels. The extended benchmarking indel set has a large range of variant allele frequencies (VAFs), with 87% of them having a VAF below 20% in reference Sample A. The reference Sample A and the indel set can be used for comprehensive benchmarking of indel calling across a wider range of VAF values in the lower range. Indel length was also variable, but the majority were under 10 base pairs (bps). Most of the indels were within coding regions, with the remainder in the gene regulatory regions. Although high confidence can be derived from the robust study design and meticulous human review, this extensive indel set has not undergone orthogonal validation. The extended benchmarking indel set, along with the indels in the previously published known-positive set, was the truth set used to benchmark indel calling pipelines in a community challenge hosted on the precisionFDA platform. This benchmarking indel set and reference samples can be utilized for a comprehensive evaluation of indel calling pipelines. Additionally, the insights and solutions obtained during the manual review process can aid in improving the performance of these pipelines.
Keywords