QRB Discovery (Jan 2024)
Protein structures unravel the signatures and patterns of deep time evolution
Abstract
The formulation and testing of hypotheses using ‘big biology data’ often lie at the interface of computational biology and structural biology. The Protein Data Bank (PDB), which was established about 50 years ago, catalogs three-dimensional (3D) shapes of organic macromolecules and showcases a structural view of biology. The comparative analysis of the structures of homologs, particularly of proteins, from different species has significantly improved the in-depth analyses of molecular and cell biological questions. In addition, computational tools that were developed to analyze the ‘protein universe’ are providing the means for efficient resolution of longstanding debates in cell and molecular evolution. In celebrating the golden jubilee of the PDB, much has been written about the transformative impact of PDB on a broad range of fields of scientific inquiry and how structural biology transformed the study of the fundamental processes of life. Yet, the transforming influence of PDB on one field of inquiry of fundamental interest—the reconstruction of the distant biological past—has gone almost unnoticed. Here, I discuss the recent advances to highlight how insights and tools of structural biology are bearing on the data required for the empirical resolution of vigorously debated and apparently contradicting hypotheses in evolutionary biology. Specifically, I show that evolutionary characters defined by protein structure are superior compared to conventional sequence characters for reliable, data-driven resolution of competing hypotheses about the origins of the major clades of life and evolutionary relationship among those clades. Since the better quality data unequivocally support two primary domains of life, it is imperative that the primary classification of life be revised accordingly.
Keywords