Research progress of causal inference in reinforcement learning framework(强化学习框架中因果推断研究进展)

刘华玲(LIU Hualing); 朱建亮(ZHU Jianliang); 任青青(REN Qingqing)

doi:10.3785/j.issn.1008-9497.2024.04.001

Zhejiang Daxue xuebao. Lixue ban (Jul 2024)

Research progress of causal inference in reinforcement learning framework(强化学习框架中因果推断研究进展)

刘华玲(LIU Hualing),
朱建亮(ZHU Jianliang),
任青青(REN Qingqing)

Affiliations

刘华玲(LIU Hualing): ORCiD; 1School of Statistics and Information， Shanghai University of International Business and Economics， Shanghai 201620， China(1上海对外经贸大学统计与信息学院,上海 201620)
朱建亮(ZHU Jianliang): 1School of Statistics and Information， Shanghai University of International Business and Economics， Shanghai 201620， China(1上海对外经贸大学统计与信息学院,上海 201620)
任青青(REN Qingqing): 2Goldman Sachs & Co. LLC，New York 10282， USA(2高盛有限责任公司，美国纽约 10282)

DOI: https://doi.org/10.3785/j.issn.1008-9497.2024.04.001
Journal volume & issue: Vol. 51, no. 4
pp. 391 – 406

Abstract

Read online

Causal reasoning has been extensively studied in all fields of science. In recent decades, there have been a number of innovations in the development and implementation of methods aimed at determining causality. Meanwhile reinforcement learning forms a field of machine learning that focuses on the concept of how agents act in an environment to maximize cumulative rewards. The idea of embedding causal inference into the framework of reinforcement learning is an important academic progress in the field of causal inference and reinforcement learning methodology in recent years. Based on this background, this paper summarizes the background and development of cutting-edge deep reinforcement learning algorithms, and introduces three types of reinforcement learning frameworks based on value function, policy gradient and model respectively. Then, from the perspective of technology application, the research results of applying reinforcement learning to causal inference and causal recognition are reviewed in five combined scenarios. On this basis, this paper emphasizes the interpretability of causal reinforcement learning and the necessity of application research, and highlights the future research directions.(因果推断在科学领域备受关注。近年来，因果关系的确定方法有所创新。强化学习作为一种机器学习方法，主要关注智能体如何在环境中采取行动，以最大化累积奖励。将因果推断方法嵌套在强化学习框架中的思想是因果推断领域以及强化学习方法论中重要的学术进展。基于此，首先，梳理了深度强化学习算法的背景和发展，介绍了基于值函数、基于策略梯度和基于模型的3类强化学习算法框架，以及与因果推断相结合的方向；其次，从5个技术应用角度，对强化学习思想在因果推断和因果识别中的应用研究进行了综述；最后，强调了强化学习框架中因果推断的数据驱动效率、稳定性及应用研究的必要性，并对未来的研究方向进行了展望。)

Published in Zhejiang Daxue xuebao. Lixue ban

ISSN: 1008-9497 (Print)
Publisher: Zhejiang University Press
Country of publisher: China
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science; Science: Physics
Website: https://www.zjujournals.com/sci/EN/1008-9497/home.shtml

About the journal

Abstract

Keywords