Sensors (Jan 2023)
Online-Dynamic-Clustering-Based Soft Sensor for Industrial Semi-Supervised Data Streams
Abstract
In the era of big data, industrial process data are often generated rapidly in the form of streams. Thus, how to process such sequential and high-speed stream data in real time and provide critical quality variable predictions has become a critical issue for facilitating efficient process control and monitoring in the process industry. Traditionally, soft sensor models are usually built through offline batch learning, which remain unchanged during the online implementation phase. Once the process state changes, soft sensors built from historical data cannot provide accurate predictions. In practice, industrial process data streams often exhibit characteristics such as nonlinearity, time-varying behavior, and label scarcity, which pose great challenges for building high-performance soft sensor models. To address this issue, an online-dynamic-clustering-based soft sensor (ODCSS) is proposed for industrial semi-supervised data streams. The method achieves automatic generation and update of clusters and samples deletion through online dynamic clustering, thus enabling online dynamic identification of process states. Meanwhile, selective ensemble learning and just-in-time learning (JITL) are employed through an adaptive switching prediction strategy, which enables dealing with gradual and abrupt changes in process characteristics and thus alleviates model performance degradation caused by concept drift. In addition, semi-supervised learning is introduced to exploit the information of unlabeled samples and obtain high-confidence pseudo-labeled samples to expand the labeled training set. The proposed method can effectively deal with nonlinearity, time-variability, and label scarcity issues in the process data stream environment and thus enable reliable target variable predictions. The application results from two case studies show that the proposed ODCSS soft sensor approach is superior to conventional soft sensors in a semi-supervised data stream environment.
Keywords