Genomics, Proteomics & Bioinformatics (Feb 2022)
Mako: A Graph-based Pattern Growth Approach to Detect Complex Structural Variants
- Jiadong Lin,
- Xiaofei Yang,
- Walter Kosters,
- Tun Xu,
- Yanyan Jia,
- Songbo Wang,
- Qihui Zhu,
- Mallory Ryan,
- Li Guo,
- Chengsheng Zhang,
- Charles Lee,
- Scott E. Devine,
- Evan E. Eichler,
- Kai Ye,
- Mark B. Gerstein,
- Ashley D. Sanders,
- Micheal C. Zody,
- Michael E. Talkowski,
- Ryan E. Mills,
- Jan O. Korbel,
- Tobias Marschall,
- Peter Ebert,
- Peter A. Audano,
- Bernardo Rodriguez-Martin,
- David Porubsky,
- Marc Jan Bonder,
- Arvis Sulovari,
- Jana Ebler,
- Weichen Zhou,
- Rebecca Serra Mari,
- Feyza Yilmaz,
- Xuefang Zhao,
- PingHsun Hsieh,
- Joyce Lee,
- Sushant Kumar,
- Tobias Rausch,
- Yu Chen,
- Zechen Chong,
- Katherine M. Munson,
- Mark J.P. Chaisson,
- Junjie Chen,
- Xinghua Shi,
- Aaron M. Wenger,
- William T. Harvey,
- Patrick Hansenfeld,
- Allison Regier,
- Ira M. Hall,
- Paul Flicek,
- Alex R. Hastie,
- Susan Fairely
Affiliations
- Jiadong Lin
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an 710049, China; MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an 710049, China; Genome Institute, the First Affiliated Hospital of Xi’an Jiaotong University, Xi’an 710061, China; Leiden Institute of Advanced Computer Science, Faculty of Science, Leiden University, Leiden 2311EZ, Netherland
- Xiaofei Yang
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an 710049, China; School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an 710049, China
- Walter Kosters
- Leiden Institute of Advanced Computer Science, Faculty of Science, Leiden University, Leiden 2311EZ, Netherland
- Tun Xu
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an 710049, China
- Yanyan Jia
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an 710049, China
- Songbo Wang
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an 710049, China
- Qihui Zhu
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
- Mallory Ryan
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
- Li Guo
- MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an 710049, China
- Chengsheng Zhang
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA; Precision Medicine Center, the First Affiliated Hospital of Xi’an Jiaotong University, Xi’an 710061, China
- Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA; Precision Medicine Center, the First Affiliated Hospital of Xi’an Jiaotong University, Xi’an 710061, China
- Scott E. Devine
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an 710049, China; Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
- Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98119, USA; Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
- Kai Ye
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an 710049, China; MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an 710049, China; Genome Institute, the First Affiliated Hospital of Xi’an Jiaotong University, Xi’an 710061, China; The School of Life Science and Technology, Xi’an Jiaotong University, Xi’an 710049, China; Corresponding author.
- Mark B. Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Ashley D. Sanders
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, D-69117 Heidelberg, Germany
- Micheal C. Zody
- New York Genome Center, New York, NY 10013, USA
- Michael E. Talkowski
- Center for Genomic Medicine, Massachusetts General Hospital, Department of Neurology, Harvard Medical School, Boston, MA 02114, USA
- Ryan E. Mills
- Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
- Jan O. Korbel
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, D-69117 Heidelberg, Germany
- Tobias Marschall
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, D-40225 Düsseldorf, Germany
- Peter Ebert
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, D-40225 Düsseldorf, Germany
- Peter A. Audano
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195-5065, USA
- Bernardo Rodriguez-Martin
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, D-69117 Heidelberg, Germany
- David Porubsky
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, D-40225 Düsseldorf, Germany
- Marc Jan Bonder
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, D-69117 Heidelberg, Germany; Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), D-69120 Heidelberg, Germany
- Arvis Sulovari
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195-5065, USA
- Jana Ebler
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, D-40225 Düsseldorf, Germany
- Weichen Zhou
- Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
- Rebecca Serra Mari
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, D-40225 Düsseldorf, Germany
- Feyza Yilmaz
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06030, USA
- Xuefang Zhao
- Center for Genomic Medicine, Massachusetts General Hospital, Department of Neurology, Harvard Medical School, Boston, MA 02114, USA
- PingHsun Hsieh
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195-5065, USA
- Joyce Lee
- Bionano Genomics, San Diego, CA 92121, USA
- Sushant Kumar
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Tobias Rausch
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, D-69117 Heidelberg, Germany
- Yu Chen
- Department of Genetics and Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294, USA
- Zechen Chong
- Department of Genetics and Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294, USA
- Katherine M. Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195-5065, USA
- Mark J.P. Chaisson
- Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
- Junjie Chen
- Department of Computer & Information Sciences, Temple University, Philadelphia, PA 19122, USA
- Xinghua Shi
- Department of Computer & Information Sciences, Temple University, Philadelphia, PA 19122, USA
- Aaron M. Wenger
- Pacific Biosystems of California, Inc, Menlo Park, CA 94025, USA
- William T. Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195-5065, USA
- Patrick Hansenfeld
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, D-69117 Heidelberg, Germany
- Allison Regier
- Washington University, St. Louis, MO 63108, USA
- Ira M. Hall
- Washington University, St. Louis, MO 63108, USA
- Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
- Alex R. Hastie
- Bionano Genomics, San Diego, CA 92121, USA
- Susan Fairely
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
- Journal volume & issue
-
Vol. 20,
no. 1
pp. 205 – 218
Abstract
Complex structural variants (CSVs) are genomic alterations that have more than two breakpoints and are considered as the simultaneous occurrence of simple structural variants. However, detecting the compounded mutational signals of CSVs is challenging through a commonly used model-match strategy. As a result, there has been limited progress for CSV discovery compared with simple structural variants. Here, we systematically analyzed the multi-breakpoint connection feature of CSVs, and proposed Mako, utilizing a bottom-up guided model-free strategy, to detect CSVs from paired-end short-read sequencing. Specifically, we implemented a graph-based pattern growth approach, where the graph depicts potential breakpoint connections, and pattern growth enables CSV detection without pre-defined models. Comprehensive evaluations on both simulated and real datasets revealed that Mako outperformed other algorithms. Notably, validation rates of CSVs on real data based on experimental and computational validations as well as manual inspections are around 70%, where the medians of experimental and computational breakpoint shift are 13 bp and 26 bp, respectively. Moreover, the Mako CSV subgraph effectively characterized the breakpoint connections of a CSV event and uncovered a total of 15 CSV types, including two novel types of adjacent segment swap and tandem dispersed duplication. Further analysis of these CSVs also revealed the impact of sequence homology on the formation of CSVs. Mako is publicly available at https://github.com/xjtu-omics/Mako.