Scientific Reports (May 2025)

Prediction and design of thermostable proteins with a desired melting temperature

  • Purva Tijare,
  • Nishant Kumar,
  • Gajendra P. S. Raghava

DOI
https://doi.org/10.1038/s41598-025-98667-9
Journal volume & issue
Vol. 15, no. 1
pp. 1 – 13

Abstract

Read online

Abstract The stability of proteins at higher temperatures is crucial for their functionality, which is measured by their melting temperature (Tm). The Tm is the temperature at which 50% of the protein loses its native structure and activity. Existing methods for predicting Tm have two major limitations: first, they are often trained on redundant proteins, and second, they do not allow users to design proteins with the desired Tm. To address these limitations, we developed a regression method for predicting the Tm value of proteins using 17,312 non-redundant proteins, where no two proteins are more than 40% similar. We used 80% of the data for training and testing and the remaining 20% for validation. Initially, we developed a machine learning model using standard features from protein sequences. Our best model, developed using Shannon entropy for all residues, achieved the highest Pearson correlation of 0.80 with an R2 of 0.63 between the predicted and actual Tm of proteins on the validation dataset. Next, we fine-tuned large language models (e.g., ProtBert, ProtGPT2, ProtT5) on our training dataset and generated embeddings. These embeddings have been used to develop machine learning models. Our best model, developed using ProtBert embeddings, achieved a maximum correlation of 0.89 with an R2 of 0.80 on the validation dataset. Finally, we developed an ensemble method that combines standard protein features and embeddings. One of the aims of the study is to assist the scientific community in the design of targeted melting temperatures. Our standalone software can be used to screen thermostable proteins at the genome level. We demonstrated the application of PPTstab in identifying thermostable proteins in different organisms. We created a user-friendly web server, and a Python package for predicting and designing thermostable proteins is available at https://webs.iiitd.edu.in/raghava/pptstab , https://github.com/raghavagps/pptstab .

Keywords