IEEE Access (Jan 2023)

Out-of-Distribution Data Generation for Fault Detection and Diagnosis in Industrial Systems

  • Jefkine Kafunah,
  • Priyanka Verma,
  • Muhammad Intizar Ali,
  • John G. Breslin

DOI
https://doi.org/10.1109/ACCESS.2023.3337658
Journal volume & issue
Vol. 11
pp. 135061 – 135073

Abstract

Read online

The emergence of Industry 4.0 has transformed modern-day factories into high-tech industrial sites through rapid automation and increased access to real-time data. Deep learning approaches possessing superior capabilities for intelligent, data-driven fault diagnosis have become critical in ensuring process safety and reliability in these industrial sites. However, such applications trained exclusively on in-distribution process data face challenges in the wake of previously unseen out-of-distribution (OOD) data in the real world. This paper addresses the challenge of out-of-distribution data detection for deep learning-based fault diagnosis models by generating synthetic data to simulate real-world anomalies not present in the training set. We propose Manifold Guided Sampling (MGS), a data-driven method for generating synthetic OOD samples from the in-distribution data-supporting manifold estimated through a deep generative model. Synthetic data from MGS enhances the model capacity for prediction uncertainty quantification, resulting in safe and reliable models for real-world industrial process monitoring. Furthermore, the MGS algorithm maintains the in-distribution data feature space as a reference point during data generation to ensure the resulting synthetic OOD data is realistic. We analyze the effectiveness of MGS through experiments conducted on the steel plates faults dataset and demonstrate that augmenting training data with synthetic data from MGS enhances the model performance in OOD detection tasks and provides robustness against dataset distributional shifts. The findings underscore the effectiveness of utilizing synthetic MGS-generated OOD data in scenarios where real-world OOD data is limited, enabling better generalization and more reliable fault detection in practical applications.

Keywords