Jisuanji kexue yu tansuo (Feb 2024)
Strategy Selection and Outcome Evaluation of Three-Way Decisions Based on Reinforcement Learning
Abstract
The trisecting-acting-outcome (TAO) model of three-way decision (3WD) consists of three steps: trisect a whole, design action strategies, and outcome analysis and measurement. Currently, research on outcome evaluation aims to measure the pre- and post-change in outcomes following the implementation of strategies, and it is still unable to predict which strategy will achieve the maximum effect. To narrow down this gap, this paper focuses on the “acting” and “outcome” of the TAO model and introduces a method for strategy selection and outcome prediction for the change-based three-way decision based on Q-learning in reinforcement learning. Firstly, the approach is to treat the altered tri-partition and the acting in the change-based three-way decision TAO model as states and actions in reinforcement learning, respectively, and to consider the process of obtaining a newly altered tri-partition each time under the acting of action or strategy as a cycle. The reward generated by each cycle is calculated using cumulative prospect theory, and the interaction process between the agent and the environment is represented by a Markov decision process. Secondly, a target reward is set, and the state when the cumulative reward of each cycle reaches the target reward is taken as the termination state of the Markov decision process. Then a Q-learning algorithm is used to iterate a set of actions that achieve the target reward in the shortest cycle and then the action set is used to predict the future utility of the change-based three-way decision. Finally, an example is employed to illustrate the applicability and effectiveness of the method.
Keywords