Water (Aug 2020)

Clustering of Time Series Water Quality Data Using Dynamic Time Warping: A Case Study from the Bukhan River Water Quality Monitoring Network

  • Seulbi Lee,
  • Jaehoon Kim,
  • Jongyeon Hwang,
  • EunJi Lee,
  • Kyoung-Jin Lee,
  • Jeongkyu Oh,
  • Jungsu Park,
  • Tae-Young Heo

DOI
https://doi.org/10.3390/w12092411
Journal volume & issue
Vol. 12, no. 9
p. 2411

Abstract

Read online

It is essential to monitor water quality for river water management because river water is used for various purposes and is directly related to the health and safety of a population. Proper network installation and removal is an important part of water quality monitoring and network operation efficiency. To do this, cluster analysis based on calculated similarity between measuring stations can be used. In this study, we measured the similarities between 12 water quality monitoring stations of the Bukhan River. River water quality data always have a station-dependent time lag because water flows from upstream to downstream; therefore, we proposed a Dynamic Time Warping (DTW) algorithm that searches for the minimum distance by changing and comparing time-points, rather than using the Euclidean algorithm, which compares the same time-point. Both Euclidean and DTW algorithms were applied to nine water quality variables to identify similarities between stations, and K-medoids cluster analysis were performed based on the similarity. The Clustering Validation Index (CVI) was used to select the optimal number of clusters. Our results show that the Euclidean algorithm formed clusters by mixing mainstream and tributary stations; the mainstream stations were largely divided into three different clusters. In contrast, the DTW algorithm formed clear clusters by reflecting the characteristics of water quality and watershed. Furthermore, because the Euclidean algorithm requires the lengths of the time series to be the same, data loss was inevitable. As a result, even where clusters were the same as those obtained by DTW, the characteristics of the water quality variables in the cluster differed. The DTW analysis in this study provides useful information for understanding the similarity or difference in water parameter values between different locations. Thus, the number and location of required monitoring stations can be adjusted to improve the efficiency of field monitoring network management.

Keywords