Research progress of deep reinforcement learning applied to text generation

Cong XU; Qing LI; De-zheng ZHANG; Peng CHEN; Jia-rui CUI

doi:10.13374/j.issn2095-9389.2019.06.16.030

工程科学学报 (Apr 2020)

Research progress of deep reinforcement learning applied to text generation

Cong XU,
Qing LI,
De-zheng ZHANG,
Peng CHEN,
Jia-rui CUI

Affiliations

Cong XU: School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China
Qing LI: School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China
De-zheng ZHANG: Beijing Key Laboratory of Knowledge Engineering for Materials Science, Beijing 100083, China
Peng CHEN: School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China
Jia-rui CUI: School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China

DOI: https://doi.org/10.13374/j.issn2095-9389.2019.06.16.030
Journal volume & issue: Vol. 42, no. 4
pp. 399 – 411

Abstract

Read online

With the recent exciting achievements of Google’s artificial intelligence system in the game of Go, deep reinforcement learning (DRL) has witnessed considerable development. DRL combines the abilities of sensing and making decisions provided by deep learning and reinforcement learning. Natural language processing (NLP) involves a large number of vocabularies or statements that have to be represented, and its subtasks, such as the dialogue system and machine translation, involve many decision problems that are difficult to model. Because of the aforementioned reasons, DRL can be appropriately applied to various NLP tasks such as named entity recognition, relation extraction, dialogue system, image caption, and machine translation. Further, DRL is helpful in improving the framework or the training pipeline of the aforementioned tasks, and notable achievements have been obtained. DRL is not an algorithm or a method but a paradigm. Many researchers fit plenty of NLP tasks in this paradigm and achieve better performance. Specifically, in text generation based on the reinforcement learning paradigm, the learning process that is used to produce a predicted sequence from the given source sequence can be considered to be the Markov decision process (MDP). In MDP, an agent interacts with the environment by receiving a sequence of observations and scaled rewards and subsequently produces the next action or word. This causes the text generation model to achieve decision-making ability, which can result in future success. Thus, the text generation task integrated with reinforcement learning is an attractive and promising research field. This study presented a comprehensive introduction and a systemic overview. First, we presented the basic methods in DRL and its variations. Then, we showed the main applications of DRL during the text generation task, trace the development of DRL, and summarized the merits and demerits associated with these applications. The final section enumerated some future research directions of DRL combined with NLP.

Published in 工程科学学报

ISSN: 2095-9389 (Print)
Publisher: Science Press
Country of publisher: China
LCC subjects: Technology: Mining engineering. Metallurgy; Technology: Engineering (General). Civil engineering (General): Environmental engineering
Website: https://cje.ustb.edu.cn/indexen.htm

About the journal

Abstract

Keywords