Applied Sciences (Apr 2023)

Stream-DBSCAN: A Streaming Distributed Clustering Model for Water Quality Monitoring

  • Chunxiao Mu,
  • Yanchen Hou,
  • Jindong Zhao,
  • Shouke Wei,
  • Yuxuan Wu

DOI
https://doi.org/10.3390/app13095408
Journal volume & issue
Vol. 13, no. 9
p. 5408

Abstract

Read online

With the increasing use of wireless sensor networks in water quality monitoring, an enormous amount of streaming data is generated by widely deployed sensors. However, the current batch mode used for data analysis can no longer meet the diverse combination of monitoring indicators and the requirement for timely analysis results on an all-weather basis. To overcome these challenges and analyze a large amount of water quality data quickly and accurately, we propose a stream-DBSCAN distributed stream processing clustering model. First, real-time data streams are processed using the distributed stream computing framework Flink. Then, the DBSCAN clustering algorithm is applied to cluster each dataset as a different dimension of the cluster. Finally, the time distribution characteristics of the data in the same cluster are analyzed to identify the water quality variation rules. The system can extract data noise points and identify sudden deterioration of water quality. We tested the model using datasets on three water quality indices, pH, ammonia nitrogen (NH4N), and turbidity, in the Yantai Menlou Reservoir from May to August 2019. The results demonstrate that the system can efficiently and quickly perform cluster analysis on streaming data. By analyzing the clustering results, we found that the daily variation of water quality and sudden pollution events in the Menlou Reservoir are consistent with the actual situation.

Keywords