Empirical Comparison of Approaches for Mitigating Effects of Class Imbalances in Water Quality Anomaly Detection

Eustace M. Dogo; Nnamdi I. Nwulu; Bhekisipho Twala; Clinton Ohis Aigbavboa

doi:10.1109/ACCESS.2020.3038658

IEEE Access (Jan 2020)

Empirical Comparison of Approaches for Mitigating Effects of Class Imbalances in Water Quality Anomaly Detection

Eustace M. Dogo,
Nnamdi I. Nwulu,
Bhekisipho Twala,
Clinton Ohis Aigbavboa

Affiliations

Eustace M. Dogo: ORCiD; Department of Electrical and Electronic Engineering Science, University of Johannesburg, Johannesburg, South Africa
Nnamdi I. Nwulu: ORCiD; Department of Electrical and Electronic Engineering Science, University of Johannesburg, Johannesburg, South Africa
Bhekisipho Twala: ORCiD; Faculty of Engineering and the Built Environment, Durban University of Technology, Durban, South Africa
Clinton Ohis Aigbavboa: Sustainable Human Settlement and Construction Research Centre, Faculty of Engineering and the Built Environment, University of Johannesburg, Johannesburg, South Africa

DOI: https://doi.org/10.1109/ACCESS.2020.3038658
Journal volume & issue: Vol. 8
pp. 218015 – 218036

Abstract

Read online

Imbalanced class distribution and missing data are two common problems and occurrences in water quality anomaly detection domain. Learning algorithms in an imbalanced dataset can yield an overrated classification accuracy driven by a bias towards the majority class at the expense of the minority class. On the other hand, missing values in data can induce complexity in the learning classifiers during data analysis. These two problems pose substantial challenges to the performance of learning algorithms in real-life water quality anomaly detection problems. Hence, the need for them to be carefully considered and addressed to achieve better performance. In this paper, the performance of a range of several combinations of techniques to deal with imbalanced classes in the context of binary-imbalanced water quality anomaly detection problem and the presence of missing values is extensively compare. The methods considered include seven missing data and eight resampling methods, on ten different learning state-of-the-art classifiers taking into account diversity in their learning philosophies. The different classifiers are evaluated using stratified 5-fold cross-validation, based on three performance evaluation metrics namely accuracy, ROC-AUC and F1-measure. Further experiments are carried out on nineteen variants of homogeneous and heterogeneous ensemble techniques embedded with resampling and missing value strategies during their training phase as well as an optimized deep neural network model. The experimental results show an improvement in the performance of the learning classifiers, especially when dealing with the class imbalance problem (on the one hand) and the incomplete data problem (on the other hand). Furthermore, the neural network model exhibit superior performance when dealing with both problems.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords