Land (Sep 2023)

Digital Mapping of Soil Organic Carbon Using Machine Learning Algorithms in the Upper Brahmaputra Valley of Northeastern India

  • Amit Kumar,
  • Pravash Chandra Moharana,
  • Roomesh Kumar Jena,
  • Sandeep Kumar Malyan,
  • Gulshan Kumar Sharma,
  • Ram Kishor Fagodiya,
  • Aftab Ahmad Shabnam,
  • Dharmendra Kumar Jigyasu,
  • Kasthala Mary Vijaya Kumari,
  • Subramanian Gandhi Doss

DOI
https://doi.org/10.3390/land12101841
Journal volume & issue
Vol. 12, no. 10
p. 1841

Abstract

Read online

Soil Organic Carbon (SOC) is a crucial indicator of ecosystem health and soil quality. Machine learning (ML) models that predict soil quality based on environmental parameters are becoming more prevalent. However, studies have yet to examine how well each ML technique performs when predicting and mapping SOC, particularly at high spatial resolutions. Model predictors include topographic variables generated from SRTM DEM; vegetation and soil indices derived from Landsat satellite images predict SOC for the Lakhimpur district of the upper Brahmaputra Valley of Assam, India. Four ML models, Random Forest (RF), Cubist, Extreme Gradient Boosting (XGBoost), and Support Vector Machine (SVM), were utilized to predict SOC for the top layer of soil (0–15 cm) at a 30 m resolution. The results showed that the descriptive statistics of the calibration and validation sets were close enough to the total set data and calibration dataset, representing the complete samples. The measured SOC content varied from 0.10 to 1.85%. The RF model’s performance was optimal in the calibration and validation sets (R2c = 0.966, RMSEc = 0.159%, R2v = 0.418, RMSEv = 0.377%). The SVM model, on the other hand, had the next-lowest accuracy, explaining 47% of the variation (R2c = 0.471, RMSEc = 0.293, R2v = 0.081, RMSEv = 0.452), while the Cubist model fared the poorest in both the calibration and validation sets. The most-critical variable in the RF model for predicting SOC was elevation, followed by MAT and MRVBF. The essential variables for the Cubist model were slope, TRI, MAT, and Band4. AP and LS were the most-essential factors in the XGBoost and SVM models. The predicted OC ranged from 0.44 to 1.35%, 0.031 to 1.61%, 0.035 to 1.71%, and 0.47 to 1.36% in the RF, Cubist, XGBoost, and SVM models, respectively. Compared with different ML models, RF was optimal (high accuracy and low uncertainty) for predicting SOC in the investigated region. According to the present modeling results, SOC may be determined simply and accurately. In general, the high-resolution maps might be helpful for decision-makers, stakeholders, and applicants in sericultural management practices towards precision sericulture.

Keywords