Large-scale determination of previously unsolved protein structures using evolutionary information
Sergey Ovchinnikov,
Lisa Kinch,
Hahnbeom Park,
Yuxing Liao,
Jimin Pei,
David E Kim,
Hetunandan Kamisetty,
Nick V Grishin,
David Baker
Affiliations
Sergey Ovchinnikov
Department of Biochemistry, University of Washington, Seattle, United States
Lisa Kinch
Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, United States
Hahnbeom Park
Department of Biochemistry, University of Washington, Seattle, United States
Yuxing Liao
Department of Biophysics, Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, United States
Jimin Pei
Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, United States
David E Kim
Department of Biochemistry, University of Washington, Seattle, United States
Hetunandan Kamisetty
Facebook Inc., Seattle, United States
Nick V Grishin
Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, United States; Department of Biophysics, Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, United States
David Baker
Department of Biochemistry, University of Washington, Seattle, United States; Howard Hughes Medical Institute, University of Washington, Seattle, United States
The prediction of the structures of proteins without detectable sequence similarity to any protein of known structure remains an outstanding scientific challenge. Here we report significant progress in this area. We first describe de novo blind structure predictions of unprecendented accuracy we made for two proteins in large families in the recent CASP11 blind test of protein structure prediction methods by incorporating residue–residue co-evolution information in the Rosetta structure prediction program. We then describe the use of this method to generate structure models for 58 of the 121 large protein families in prokaryotes for which three-dimensional structures are not available. These models, which are posted online for public access, provide structural information for the over 400,000 proteins belonging to the 58 families and suggest hypotheses about mechanism for the subset for which the function is known, and hypotheses about function for the remainder.