Security State Estimation for Cyber-Physical Systems against DoS Attacks via Reinforcement Learning and Game Theory

Zengwang Jin; Shuting Zhang; Yanyan Hu; Yanning Zhang; Changyin Sun

doi:10.3390/act11070192

Actuators (Jul 2022)

Security State Estimation for Cyber-Physical Systems against DoS Attacks via Reinforcement Learning and Game Theory

Zengwang Jin,
Shuting Zhang,
Yanyan Hu,
Yanning Zhang,
Changyin Sun

Affiliations

Zengwang Jin: School of Cybersecurity, Northwestern Polytechnical University, Xi’an 710072, China
Shuting Zhang: School of Cybersecurity, Northwestern Polytechnical University, Xi’an 710072, China
Yanyan Hu: School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China
Yanning Zhang: National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology, Northwestern Polytechnical University, Xi’an 710072, China
Changyin Sun: School of Automation, Southeast University, Nanjing 210096, China

DOI: https://doi.org/10.3390/act11070192
Journal volume & issue: Vol. 11, no. 7
p. 192

Abstract

Read online

This paper addressed the optimal policy selection problem of attacker and sensor in cyber-physical systems (CPSs) under denial of service (DoS) attacks. Since the sensor and the attacker have opposite goals, a two-player zero-sum game is introduced to describe the game between the sensor and the attacker, and the Nash equilibrium strategies are studied to obtain the optimal actions. In order to effectively evaluate and quantify the gains, a reinforcement learning algorithm is proposed to dynamically adjust the corresponding strategies. Furthermore, security state estimation is introduced to evaluate the impact of offensive and defensive strategies on CPSs. In the algorithm, the ε-greedy policy is improved to make optimal choices based on sufficient learning, achieving a balance of exploration and exploitation. It is worth noting that the channel reliability factor is considered in order to study CPSs with multiple reasons for packet loss. The reinforcement learning algorithm is designed in two scenarios: reliable channel (that is, the reason for packet loss is only DoS attacks) and unreliable channel (the reason for packet loss is not entirely from DoS attacks). The simulation results of the two scenarios show that the proposed reinforcement learning algorithm can quickly converge to the Nash equilibrium policies of both sides, proving the availability and effectiveness of the algorithm.

Published in Actuators

ISSN: 2076-0825 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Materials of engineering and construction. Mechanics of materials; Technology: Electrical engineering. Electronics. Nuclear engineering: Production of electric energy or power. Powerplants. Central stations
Website: http://www.mdpi.com/journal/actuators

About the journal

Abstract

Keywords