Scientific Data (Feb 2025)

Prokaryotic cellulase gene clusters derived from 2,305 metagenomes

  • Bing Song,
  • Fernando D. K. Tria,
  • Josip Skejo

DOI
https://doi.org/10.1038/s41597-025-04524-9
Journal volume & issue
Vol. 12, no. 1
pp. 1 – 5

Abstract

Read online

Abstract Cellulose is a carbon source widespread in nature. However, it is a difficult task for any organism to get carbon atoms from the cellulose as it has a highly complex structure. Only a few taxonomic groups are known to decompose cellulose. They do it by producing cellulases, the various enzymes which break beta-glycosidic bonds in the cellulose. Cellulases were identified in 1,735 metagenomes from 225 bioprojects. The set of 12,837 metagenome-derived cellulases encompass three catalytic functions: exoglucanases (CBH, 1,042), endoglucanases (EG, 5,685), and beta-glucosidases (βG, 6,110). All three enzymatic functions are thought to be necessary for driving cellulase to a cascade of reactions that can make cellulose available as glucose. These metagenome-derived cellulases were clustered into protein families for each EC category individually, resulting in a total of 136 clusters, with the majority observed for EG (97 clusters), followed by βG (19 clusters) and CBH (19 clusters). These clusters provided a useful cellulase dataset for future research on cellulase utilization.