Molecular Genetics and Metabolism Reports (Dec 2023)
Predicting the pathogenicity of missense variants based on protein instability to support diagnosis of patients with novel variants of ARSL
Abstract
Rare diseases are estimated to affect 3.5%–5.9% of the population worldwide and are difficult to diagnose. Genome analysis is useful for diagnosis. However, since some variants, especially missense variants, are also difficult to interpret, tools to accurately predict the effect of missense variants are very important and needed. Here we developed a method, “VarMeter”, to predict whether a missense variant is damaging based on Gibbs free energy and solvent-accessible surface area calculated from the AlphaFold 3D protein model. We applied this method to the whole-exome sequencing data of 900 individuals with rare or undiagnosed disease in our in-house database, and identified four who were hemizygous for missense variants of arylsulfatase L (ARSL; known as the genetic cause of chondrodysplasia punctata 1, CPDX1). Two individuals had a novel Ser89 to Asn (Ser89Asn) or Arg469 to Trp (Arg469Trp) substitution, respectively predicted as “damaging” or “benign”; the other two had an Arg111 to His (Arg111His) or Gly117 to Arg (Gly117Arg) substitution, respectively predicted as “damaging” or “possibly damaging” and previously reported in patients showing clinical manifestations of CDPX1. Expression and analysis of the missense variant proteins showed that the predicted pathogenic variants (Ser89Asn, Arg111His, and Gly117Arg) had complete loss of sulfatase activity and reduced protease resistance due to destabilization of protein structure, while the predicted benign variant (Arg469Trp) had activity and protease resistance comparable to those of wild-type ARSL. The individual with the novel pathogenic Ser89Asn variant exhibited characteristics of CDPX1, while the individual with the benign Arg469Trp variant exhibited no such characteristics. These findings demonstrate that VarMeter may be used to predict the deleteriousness of variants found in genome sequencing data and thereby support disease diagnosis.