Genome Biology (Nov 2020)

SVFX: a machine learning framework to quantify the pathogenicity of structural variants

  • Sushant Kumar,
  • Arif Harmanci,
  • Jagath Vytheeswaran,
  • Mark B. Gerstein

DOI
https://doi.org/10.1186/s13059-020-02178-x
Journal volume & issue
Vol. 21, no. 1
pp. 1 – 21

Abstract

Read online

Abstract There is a lack of approaches for identifying pathogenic genomic structural variants (SVs) although they play a crucial role in many diseases. We present a mechanism-agnostic machine learning-based workflow, called SVFX, to assign pathogenicity scores to somatic and germline SVs. In particular, we generate somatic and germline training models, which include genomic, epigenomic, and conservation-based features, for SV call sets in diseased and healthy individuals. We then apply SVFX to SVs in cancer and other diseases; SVFX achieves high accuracy in identifying pathogenic SVs. Predicted pathogenic SVs in cancer cohorts are enriched among known cancer genes and many cancer-related pathways.