Validation of a Machine Learning-Based IDS Design Framework Using ORNL Datasets for Power System With SCADA

Marzia Zaman; Darshana Upadhyay; Chung-Horng Lung

doi:10.1109/ACCESS.2023.3326751

IEEE Access (Jan 2023)

Validation of a Machine Learning-Based IDS Design Framework Using ORNL Datasets for Power System With SCADA

Marzia Zaman,
Darshana Upadhyay,
Chung-Horng Lung

Affiliations

Marzia Zaman: ORCiD; Research and Development Department, Cistel Technology Inc., Ottawa, Canada
Darshana Upadhyay: ORCiD; Research and Development Department, Cistel Technology Inc., Ottawa, Canada
Chung-Horng Lung: ORCiD; Department of Systems and Computer Engineering, Carleton University, Ottawa, Canada

DOI: https://doi.org/10.1109/ACCESS.2023.3326751
Journal volume & issue: Vol. 11
pp. 118414 – 118426

Abstract

Read online

Supervisory Control and Data Acquisition (SCADA) systems are widely used for remote monitoring and control of industrial processes, such as oil and gas production, power generation, transmission and distribution, and water treatment. Despite the enhanced accessibility, control, and data availability afforded by recent advances in communication technologies, the utilization of these technologies exposes critical infrastructures such as power systems to potential cyber threats. A Machine Learning (ML)-based Intrusion Detection System (IDS) seems promising; however, the development of ML models often requires custom methodologies for data preprocessing and training. This strategic approach is necessary for creating high-performance models that can be robustly evaluated and seamlessly integrated into real-time systems. As a result, we propose an ML-based IDS design framework for a SCADA-based power system incorporating effective modeling aspects, such as dataset preprocessing to ensure accurate representation, data augmentation for achieving a balanced dataset, automated feature selection to reduce dimensionality, and rigorous model training and testing procedures. To substantiate our proposed design framework, we conducted a series of experiments using a publicly available ORNL (Oak Ridge National Laboratory) dataset for a SCADA-based power system. The evaluation process encompasses efficient validation techniques with unseen data. Furthermore, the augmented dataset emerged through the aggregation of readings from four Phasor Measurement Units (PMUs) collected over a specific time span into a unified dataset. Among the assessed classifiers, the Random Forest (RF) model, trained on an augmented and balanced dataset, outperformed others, yielding an F1 score of 94.09% during testing with unseen data.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords