IEEE Access (Jan 2021)

An Effective Machine Learning Scheme to Analyze and Predict the Concentration of Persistent Pollutants in the Great Lakes

  • Chunxue Wu,
  • Bin Li,
  • Naixue Xiong

DOI
https://doi.org/10.1109/ACCESS.2021.3069990
Journal volume & issue
Vol. 9
pp. 52252 – 52265

Abstract

Read online

Persistent organic pollutants (POPs) are highly toxic and difficult to degrade in the natural ecology, which has a severe negative impact on the ecological environment. Quantifying changes in the concentrations of persistent organic pollutants in the Great Lakes is challenging work. Machine learning (ML) methods are potent predictors that have recently achieved impressive performance on time series tasks. ARIMA model, Linear Regression methods, XGBoost algorithm, and Long Short-Term Memory (LSTM) are commonly used for estimating time-series changes. Traditionally Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) have been standard criteria to measure the error between the actual value and predicted value; however, Euclidean distance (ED) cannot effectively calculate the similarity between two-time series. We proposed an alternative criterion called Penalty Dynamic Time Wrapping (Penalty-DTW) based on Dynamic Time Wrapping (DTW). It can accurately measure the difference between the actual value and the predicted value. We study the benefits of Penalty-DTW vs. ED under the above ML algorithms. Further, considering the machine learning algorithm’s uncertainty, we proposed combining LSTM and deep ensemble methods to quantify algorithms uncertainty and make a confident prediction. We find improved LSTM model outperformed other predictive power models by comparing pollutant concentration prediction. The prediction results show that the concentration of pollutants has a stable downward trend in recent years. Simultaneously, we found that pollutants’ concentration correlates with seasons, which positively guides environmental pollution control in the Great Lakes.

Keywords