PDBeCIF: an open-source mmCIF/CIF parsing and processing package

Glen van Ginkel; Lukáš Pravda; José M. Dana; Mihaly Varadi; Peter Keller; Stephen Anyango; Sameer Velankar

doi:10.1186/s12859-021-04271-9

BMC Bioinformatics (Jul 2021)

PDBeCIF: an open-source mmCIF/CIF parsing and processing package

Glen van Ginkel,
Lukáš Pravda,
José M. Dana,
Mihaly Varadi,
Peter Keller,
Stephen Anyango,
Sameer Velankar

Affiliations

Glen van Ginkel: European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI)
Lukáš Pravda: European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI)
José M. Dana: European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI)
Mihaly Varadi: European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI)
Peter Keller: Global Phasing Ltd.
Stephen Anyango: European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI)
Sameer Velankar: European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI)

DOI: https://doi.org/10.1186/s12859-021-04271-9
Journal volume & issue: Vol. 22, no. 1
pp. 1 – 7

Abstract

Read online

Abstract Background Biomacromolecular structural data outgrew the legacy Protein Data Bank (PDB) format which the scientific community relied on for decades, yet the use of its successor PDBx/Macromolecular Crystallographic Information File format (PDBx/mmCIF) is still not widespread. Perhaps one of the reasons is the availability of easy to use tools that only support the legacy format, but also the inherent difficulties of processing mmCIF files correctly, given the number of edge cases that make efficient parsing problematic. Nevertheless, to fully exploit macromolecular structure data and their associated annotations such as multiscale structures from integrative/hybrid methods or large macromolecular complexes determined using traditional methods, it is necessary to fully adopt the new format as soon as possible. Results To this end, we developed PDBeCIF, an open-source Python project for manipulating mmCIF and CIF files. It is part of the official list of mmCIF parsers recorded by the wwPDB and is heavily employed in the processes of the Protein Data Bank in Europe. The package is freely available both from the PyPI repository ( http://pypi.org/project/pdbecif ) and from GitHub ( https://github.com/pdbeurope/pdbecif ) along with rich documentation and many ready-to-use examples. Conclusions PDBeCIF is an efficient and lightweight Python 2.6+/3+ package with no external dependencies. It can be readily integrated with 3rd party libraries as well as adopted for broad scientific analyses.

Published in BMC Bioinformatics

ISSN: 1471-2105 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Biology (General)
Website: http://www.biomedcentral.com/bmcbioinformatics/

About the journal

Abstract

Keywords