PLoS Computational Biology (Apr 2021)

Accurate contact-based modelling of repeat proteins predicts the structure of new repeats protein families.

  • Claudio Bassot,
  • Arne Elofsson

DOI
https://doi.org/10.1371/journal.pcbi.1008798
Journal volume & issue
Vol. 17, no. 4
p. e1008798

Abstract

Read online

Repeat proteins are abundant in eukaryotic proteomes. They are involved in many eukaryotic specific functions, including signalling. For many of these proteins, the structure is not known, as they are difficult to crystallise. Today, using direct coupling analysis and deep learning it is often possible to predict a protein's structure. However, the unique sequence features present in repeat proteins have been a challenge to use direct coupling analysis for predicting contacts. Here, we show that deep learning-based methods (trRosetta, DeepMetaPsicov (DMP) and PconsC4) overcomes this problem and can predict intra- and inter-unit contacts in repeat proteins. In a benchmark dataset of 815 repeat proteins, about 90% can be correctly modelled. Further, among 48 PFAM families lacking a protein structure, we produce models of forty-one families with estimated high accuracy.