Computational and Structural Biotechnology Journal (Jan 2022)

Using metagenomic data to boost protein structure prediction and discovery

  • Qingzhen Hou,
  • Fabrizio Pucci,
  • Fengming Pan,
  • Fuzhong Xue,
  • Marianne Rooman,
  • Qiang Feng

Journal volume & issue
Vol. 20
pp. 434 – 442

Abstract

Read online

Over the past decade, metagenomic sequencing approaches have been providing an ever-increasing amount of protein sequence data at an astonishing rate. These constitute an invaluable source of information which has been exploited in various research fields such as the study of the role of the gut microbiota in human diseases and aging. However, only a small fraction of all metagenomic sequences collected have been functionally or structurally characterized, leaving much of them completely unexplored. Here, we review how this information has been used in protein structure prediction and protein discovery. We begin by presenting some widely used metagenomic databases and analyze in detail how metagenomic data has contributed to the impressive improvement in the accuracy of structure prediction methods in recent years. We then examine how metagenomic information can be exploited to annotate protein sequences. More specifically, we focus on the role of metagenomes in the discovery of enzymes and new CRISPR-Cas systems, and in the identification of antibiotic resistance genes. With this review, we provide an overview of how metagenomic data is currently revolutionizing our understanding of protein science.

Keywords