IEEE Access (Jan 2024)
A Process Monitoring Framework for Imbalanced Big Data: A Wastewater Treatment Plant Case Study
Abstract
In recent years, process monitoring structures utilize big data analytics to offer a more realistic interpretation of systems. Nevertheless, managing large datasets and providing affirmative responses are common obstacles of using such monitoring frameworks. Practically, faulty conditions are less prevalent than normal situations. Thereby, coping with an imbalanced data distribution is another challenge studied here. This paper presents an innovative fault detection framework that addresses the challenges of imbalanced data distribution and big data complexities for wastewater treatment plants (WWTPs). The fault scenarios implemented for the WWTP in this research include distortions in both process and equipment, individually as well as together. For this purpose, an advanced preprocessing stage is designed, including a measurement selection method and an under-sampling algorithm. First, the measurements that convey a fair amount of information in terms of different fault scenarios are selected. Subsequently, a novel under-sampling approach is implemented to remove a number of data points from the majority class (normal conditions). The down-sampling strategy is designed in a way that trades off the amount of data elimination and information loss. The extracted features are then inserted into a typical neural network classifier for decision making. The Area Under Curve and Geometric Mean serve as effective indicators in investigating the fault detection capability of handling imbalanced big datasets. When applying the proposed fault detection framework, the average AUC and Gmean for individual faults and faults simulation scenarios are over 98% while without implementing the advanced preprocessing stage the obtained indicator values are below 79%.
Keywords