iScience (Apr 2025)

BERT-DomainAFP: Antifreeze protein recognition and classification model based on BERT and structural domain annotation

  • Shengzhen Chen,
  • Ping Zheng,
  • Lele Zheng,
  • Qinglong Yao,
  • Ziyu Meng,
  • Longshan Lin,
  • Xinhua Chen,
  • Ruoyu Liu

Journal volume & issue
Vol. 28, no. 4
p. 112077

Abstract

Read online

Summary: Antifreeze proteins (AFPs) are crucial for organisms to adapt to low temperatures, with applications in medicine, food storage, aquaculture, and agriculture. Accurate AFP identification is challenging due to structural and sequence diversity. To improve prediction and classification, we propose BERT-DomainAFP, a deep learning model trained on the AntiFreezeDomains dataset created with a novel annotation strategy. The model uses pre-trained ProteinBERT and incorporates oversampling and undersampling techniques to handle unbalanced data, ensuring high predictive ability. BERT-DomainAFP achieves 98.48% accuracy, the highest among existing models, and can classify different AFP types based on structural domain features. This model outperforms current tools, offering a promising solution for AFP recognition and classification in research and applications.

Keywords