Frontiers in Big Data (Jun 2020)

Deep Learning Optimizes Data-Driven Representation of Soil Organic Carbon in Earth System Model Over the Conterminous United States

  • Feng Tao,
  • Feng Tao,
  • Zhenghu Zhou,
  • Yuanyuan Huang,
  • Qianyu Li,
  • Qianyu Li,
  • Xingjie Lu,
  • Xingjie Lu,
  • Shuang Ma,
  • Xiaomeng Huang,
  • Xiaomeng Huang,
  • Yishuang Liang,
  • Yishuang Liang,
  • Gustaf Hugelius,
  • Lifen Jiang,
  • Russell Doughty,
  • Zhehao Ren,
  • Yiqi Luo

DOI
https://doi.org/10.3389/fdata.2020.00017
Journal volume & issue
Vol. 3

Abstract

Read online

Soil organic carbon (SOC) is a key component of the global carbon cycle, yet it is not well-represented in Earth system models to accurately predict global carbon dynamics in response to climate change. This novel study integrated deep learning, data assimilation, 25,444 vertical soil profiles, and the Community Land Model version 5 (CLM5) to optimize the model representation of SOC over the conterminous United States. We firstly constrained parameters in CLM5 using observations of vertical profiles of SOC in both a batch mode (using all individual soil layers in one batch) and at individual sites (site-by-site). The estimated parameter values from the site-by-site data assimilation were then either randomly sampled (random-sampling) to generate continentally homogeneous (constant) parameter values or maximally preserved for their spatially heterogeneous distributions (varying parameter values to match the spatial patterns from the site-by-site data assimilation) so as to optimize spatial representation of SOC in CLM5 through a deep learning technique (neural networking) over the conterminous United States. Comparing modeled spatial distributions of SOC by CLM5 to observations yielded increasing predictive accuracy from default CLM5 settings (R2 = 0.32) to randomly sampled (0.36), one-batch estimated (0.43), and deep learning optimized (0.62) parameter values. While CLM5 with parameter values derived from random-sampling and one-batch methods substantially corrected the overestimated SOC storage by that with default model parameters, there were still considerable geographical biases. CLM5 with the spatially heterogeneous parameter values optimized from the neural networking method had the least estimation error and less geographical biases across the conterminous United States. Our study indicated that deep learning in combination with data assimilation can significantly improve the representation of SOC by complex land biogeochemical models.

Keywords