BMC Bioinformatics (Jul 2023)

SNPeffect 5.0: large-scale structural phenotyping of protein coding variants extracted from next-generation sequencing data using AlphaFold models

  • Kobe Janssen,
  • Ramon Duran-Romaña,
  • Guy Bottu,
  • Mainak Guharoy,
  • Alexander Botzki,
  • Frederic Rousseau,
  • Joost Schymkowitz

DOI
https://doi.org/10.1186/s12859-023-05407-9
Journal volume & issue
Vol. 24, no. 1
pp. 1 – 9

Abstract

Read online

Abstract Background Next-generation sequencing technologies yield large numbers of genetic alterations, of which a subset are missense variants that alter an amino acid in the protein product. These variants can have a potentially destabilizing effect leading to an increased risk of misfolding and aggregation. Multiple software tools exist to predict the effect of single-nucleotide variants on proteins, however, a pipeline integrating these tools while starting from an NGS data output list of variants is lacking. Results The previous version SNPeffect 4.0 (De Baets in Nucleic Acids Res 40(D1):D935–D939, 2011) provided an online database containing pre-calculated variant effects and low-throughput custom variant analysis. Here, we built an automated and parallelized pipeline that analyzes the impact of missense variants on the aggregation propensity and structural stability of proteins starting from the Variant Call Format as input. The pipeline incorporates the AlphaFold Protein Structure Database to achieve high coverage for structural stability analyses using the FoldX force field. The effect on aggregation-propensity is analyzed using the established predictors TANGO and WALTZ. The pipeline focuses solely on the human proteome and can be used to analyze proteome stability/damage in a given sample based on sequencing results. Conclusion We provide a bioinformatics pipeline that allows structural phenotyping from sequencing data using established stability and aggregation predictors including FoldX, TANGO, and WALTZ; and structural proteome coverage provided by the AlphaFold database. The pipeline and installation guide are freely available for academic users on https://github.com/vibbits/snpeffect and requires a computer cluster.

Keywords