International Soil and Water Conservation Research (Mar 2024)

Remote estimates of suspended particulate matter in global lakes using machine learning models

  • Zhidan Wen,
  • Qiang Wang,
  • Yue Ma,
  • Pierre Andre Jacinthe,
  • Ge Liu,
  • Sijia Li,
  • Yingxin Shang,
  • Hui Tao,
  • Chong Fang,
  • Lili Lyu,
  • Baohua Zhang,
  • Kaishan Song

Journal volume & issue
Vol. 12, no. 1
pp. 200 – 216

Abstract

Read online

Suspended particulate matter (SPM) in lakes exerts strong impact on light propagation, aquatic ecosystem productivity, which co-varies with nutrients, heavy metal and micro-pollutant in waters. In lakes, SPM exerts strong absorption and backscattering, ultimately affects water leaving signals that can be detected by satellite sensors. Simple regression models based on specific band or hand ratios have been widely used for SPM estimate in the past with moderate accuracy. There are still rooms for model accuracy improvements, and machine learning models may solve the non-linear relationships between spectral variable and SPM in waters. We assembled more than 16,400 in situ measured SPM in lakes from six continents (excluding the Antarctica continent), of which 9640 samples were matched with Landsat overpasses within ±7 days. Seven machine learning algorithms and two simple regression methods (linear and partial least squares models) were used to estimate SPM in lakes and the performance were compared. To overcome the problem of imbalance datasets in regression, a Synthetic Minority Over-Sampling technique for regression with Gaussian Noise (SMOGN) was adopted in this study. Through comparison, we found that gradient boosting decision tree (GBDT), random forest (RF), and extreme gradient boosting (XGBoost) models demonstrated good spatiotemporal transferability with SMOGN processed dataset, and has potential to map SPM at different year with good quality of Landsat land surface reflectance images. In all the tested modeling approaches, the GBDT model has accurate calibration (n = 6428, R2 = 0.95, MAPE = 29.8%) from SPM collected in 2235 lakes across the world, and the validation (n = 3214, R2 = 0.84, MAPE = 38.8%) also exhibited stable performance. Further, the good performances were also exhibited by RF model with calibration (R2 = 0.93) and validation (R2 = 0.86, MAPE = 24.2%) datasets. We applied GBDT and RF models to map SPM of typical lakes, and satisfactory result was obtained. In addition, the GBDT model was evaluated by historical SPM measurements coincident with different Landsat sensors (L5-TM, L7-ETM+, and L8-OLI), thus the model has the potential to map SPM of lakes for monitoring temporal variations, and tracks lake water SPM dynamics in approximately the past four decades (1984–2021) since Landsat-5/TM was launched in 1984.

Keywords