H2Open Journal (Jun 2023)

Infilling missing data and outliers for a conventional sewage treatment plant using a self-organizing map: a case study of Kauma Sewage Treatment Plant in Lilongwe, Malawi

  • Madalitso H. Mng'ombe,
  • Brighton Austin Chunga,
  • Eddie W. Mtonga,
  • Russel C. G. Chidya,
  • Mphatso Malota

DOI
https://doi.org/10.2166/h2oj.2023.013
Journal volume & issue
Vol. 6, no. 2
pp. 280 – 296

Abstract

Read online

Data availability is key for modeling of wastewater treatment processes. However, process data are characterized by missing values and outliers. This study applied a self-organizing map (SOM) to fill in missing values and replace outliers in wastewater treatment data from Kauma Sewage Treatment Plant in Lilongwe, Malawi. We used primary and secondary wastewater data and executed the SOM algorithm to fill missing values and replace outliers in effluent pH, biochemical oxygen demand, and dissolved oxygen. The results suggest that the SOM algorithm is reliable in filling gaps in wastewater time series data with less than 50% missing values with correlation coefficient (R) values of >0.90. The SOM algorithm failed to reliably fill gaps and replace outliers in time series data with >50% missing values. For instance, high mean square error (MSE) values of 3,655.57, 10.62, and 2,153.34 for pH, DO, and BOD, respectively, were registered in datasets with more than 50% missing values, while very small MSE values (MSE ≈ 0) were associated with effluent pH, BOD, and DO data with missing values of >50%. Practitioners can use this approach to improve the planning and management of wastewater treatment facilities where available data records are riddled with missing observations. HIGHLIGHTS Missing data impinge on wastewater treatment plant processes efficiency.; The advancement of information technology and artificial intelligence enables the infilling of missing data.; We proposed to infill missing data and outliers using a Multivariate model called the Self-Organizing Map.; Missing data and outliers are replaced with reasonable estimates.; The approach has provided long series data for modelling the behavior of the wastewater treatment process.;

Keywords