Data in Brief (Dec 2024)

A dataset to train intrusion detection systems based on machine learning models for electrical substationsZenodo

  • Esteban Damián Gutiérrez Mlot,
  • Jose Saldana,
  • Ricardo J. Rodríguez,
  • Igor Kotsiuba,
  • Carlos Gañán

Journal volume & issue
Vol. 57
p. 111153

Abstract

Read online

The growing integration of Information and Communication Technology into Operational Technology environments in electrical substations exposes them to new cybersecurity threats. This paper presents a comprehensive dataset of substation traffic, aimed at improving the training and benchmarking of Intrusion Detection Systems (IDS) installed in these facilities that are based on machine learning techniques. The dataset includes raw network captures and flows from real substations, filtered and anonymized to ensure privacy. It covers the main protocols and standards used in substation environments: IEC61850, IEC104, NTP, and PTP. Additionally, the dataset includes traces obtained during several cyberattacks, which were simulated in a controlled laboratory environment, providing a rich resource for developing and testing machine learning models for cybersecurity applications in substations. A set of complementary tools for dataset creation and preprocessing are also included to standardize the methodology, ensuring consistency and reproducibility. In summary, the dataset addresses the critical need for high-quality, targeted data for tuning IDS at electrical substations and contributes to the advancement of secure and reliable power distribution networks.

Keywords