IEEE Access (Jan 2025)
Detection of Adversarial Attacks Using Deep Learning and Features Extracted From Interpretability Methods in Industrial Scenarios
Abstract
The adversarial training technique has been shown to improve the robustness of Machine Learning and Deep Learning models to adversarial attacks in the Computer Vision field. However, the effectiveness of this approach needs to be proven in the field of Anomaly Detection on industrial environments, where adversarial training has critical limitations. First, the time to train the Anomaly Detection system is higher since adversarial samples needs to be generated in each epoch. Second, the adversarial training blurs the decision border and, therefore, the performance of the Anomaly Detection system decreases. To solve these limitations, we propose a novel framework that can be deployed in constrained devices typically used in industrial scenarios. The framework relies on features extracted after applying interpretability methods to time-series data. These features will be used to train a new Deep Learning model that discriminates between adversarial and non-adversarial samples. We validated two configurations of the framework using two different industrial scenarios: the Tennessee Eastman Process (TEP) and Secure Water Treatment (SWaT). Next, we compared the results between our proposal and the approach using traditional adversarial training. Our proposal took significantly less time to be trained (around 42.5% and 90% less time for TEP and SWaT, respectively). Besides, after applying the adversarial training, we observed a downgrade of the F1-score of the Anomaly Detection system from 0.929 to 0.707 in the TEP scenario and from 0.972 to 0.923 for SWaT scenario. Finally, we observed that our approach was slower than adversarial training in terms of evaluation time due to the intensive computation of features from interpretability methods. However, this did not prevent our approach from being used in real time to detect adversarial samples.
Keywords