Annotating Macromolecular Complexes in the Protein Data Bank: Improving the FAIRness of Structure Data

Sri Devan Appasamy; John Berrisford; Romana Gaborova; Sreenath Nair; Stephen Anyango; Sergei Grudinin; Mandar Deshpande; David Armstrong; Ivanna Pidruchna; Joseph I. J. Ellaway; Grisell Díaz Leines; Deepti Gupta; Deborah Harrus; Mihaly Varadi; Sameer Velankar

doi:10.1038/s41597-023-02778-9

Scientific Data (Dec 2023)

Annotating Macromolecular Complexes in the Protein Data Bank: Improving the FAIRness of Structure Data

Sri Devan Appasamy,
John Berrisford,
Romana Gaborova,
Sreenath Nair,
Stephen Anyango,
Sergei Grudinin,
Mandar Deshpande,
David Armstrong,
Ivanna Pidruchna,
Joseph I. J. Ellaway,
Grisell Díaz Leines,
Deepti Gupta,
Deborah Harrus,
Mihaly Varadi,
Sameer Velankar

Affiliations

Sri Devan Appasamy: Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton
John Berrisford: Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton
Romana Gaborova: CEITEC – Central European Institute of Technology, Masaryk University
Sreenath Nair: Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton
Stephen Anyango: Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton
Sergei Grudinin: Univ. Grenoble Alpes, CNRS, Grenoble INP, LJK
Mandar Deshpande: Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton
David Armstrong: Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton
Ivanna Pidruchna: Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton
Joseph I. J. Ellaway: Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton
Grisell Díaz Leines: Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton
Deepti Gupta: Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton
Deborah Harrus: Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton
Mihaly Varadi: Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton
Sameer Velankar: Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton

DOI: https://doi.org/10.1038/s41597-023-02778-9
Journal volume & issue: Vol. 10, no. 1
pp. 1 – 13

Abstract

Read online

Abstract Macromolecular complexes are essential functional units in nearly all cellular processes, and their atomic-level understanding is critical for elucidating and modulating molecular mechanisms. The Protein Data Bank (PDB) serves as the global repository for experimentally determined structures of macromolecules. Structural data in the PDB offer valuable insights into the dynamics, conformation, and functional states of biological assemblies. However, the current annotation practices lack standardised naming conventions for assemblies in the PDB, complicating the identification of instances representing the same assembly. In this study, we introduce a method leveraging resources external to PDB, such as the Complex Portal, UniProt and Gene Ontology, to describe assemblies and contextualise them within their biological settings accurately. Employing the proposed approach, we assigned standard names to over 90% of unique assemblies in the PDB and provided persistent identifiers for each assembly. This standardisation of assembly data enhances the PDB, facilitating a deeper understanding of macromolecular complexes. Furthermore, the data standardisation improves the PDB’s FAIR attributes, fostering more effective basic and translational research and scientific education.

Published in Scientific Data

ISSN: 2052-4463 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Science
Website: https://www.nature.com/sdata/

About the journal