Pareto Optimal Solutions for Network Defense Strategy Selection Simulator in Multi-Objective Reinforcement Learning

Yang Sun; Yun Li; Wei Xiong; Zhonghua Yao; Krishna Moniz; Ahmed Zahir

doi:10.3390/app8010136

Applied Sciences (Jan 2018)

Pareto Optimal Solutions for Network Defense Strategy Selection Simulator in Multi-Objective Reinforcement Learning

Yang Sun,
Yun Li,
Wei Xiong,
Zhonghua Yao,
Krishna Moniz,
Ahmed Zahir

Affiliations

Yang Sun: Science and Technology on Complex Electronic System Simulation Laboratory, Space Engineering University, Beijing 101400, China
Yun Li: Network Management Center, CAPF, Beijing 100089, China
Wei Xiong: Science and Technology on Complex Electronic System Simulation Laboratory, Space Engineering University, Beijing 101400, China
Zhonghua Yao: Science and Technology on Complex Electronic System Simulation Laboratory, Space Engineering University, Beijing 101400, China
Krishna Moniz: College of Humanities and Sciences, University of Montana, 32 Campus Dr, Missoula, MT 59812, USA
Ahmed Zahir: Centre for Instructional Design and Technology, Open University Malaysia, Jalan Tun Ismail, 50480 Kuala Lumpur, Malaysia

DOI: https://doi.org/10.3390/app8010136
Journal volume & issue: Vol. 8, no. 1
p. 136

Abstract

Read online

Using Pareto optimization in Multi-Objective Reinforcement Learning (MORL) leads to better learning results for network defense games. This is particularly useful for network security agents, who must often balance several goals when choosing what action to take in defense of a network. If the defender knows his preferred reward distribution, the advantages of Pareto optimization can be retained by using a scalarization algorithm prior to the implementation of the MORL. In this paper, we simulate a network defense scenario by creating a multi-objective zero-sum game and using Pareto optimization and MORL to determine optimal solutions and compare those solutions to different scalarization approaches. We build a Pareto Defense Strategy Selection Simulator (PDSSS) system for assisting network administrators on decision-making, specifically, on defense strategy selection, and the experiment results show that the Satisficing Trade-Off Method (STOM) scalarization approach performs better than linear scalarization or GUESS method. The results of this paper can aid network security agents attempting to find an optimal defense policy for network security games.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords