IEEE Access (Jan 2019)

QuGAN: Quasi Generative Adversarial Network for Tibetan Question Answering Corpus Generation

  • Yuan Sun,
  • Chaofan Chen,
  • Tianci Xia,
  • Xiaobing Zhao

DOI
https://doi.org/10.1109/ACCESS.2019.2934581
Journal volume & issue
Vol. 7
pp. 116247 – 116255

Abstract

Read online

In recent years, the large-scale open Chinese and English question answering (QA) corpora have provided important support for the application of deep learning in the Chinese and English QA systems. However, for low-resource languages, such as Tibetan, it is difficult to construct satisfactory QA systems, owing to the lack of large-scale Tibetan QA corpora. To solve this problem, this paper proposes a QA corpus generation model, called QuGAN. This model combines Quasi-Recurrent Neural Networks and Reinforcement Learning. The Quasi-Recurrent Neural Networks model is used as a generator for Generative Adversarial Network, which speeds up the generation of text. At the same time, the reward strategy and Monte Carlo search strategy are optimized to effectively update the generator network. Finally, we use the Bidirectional Encoder Representations from Transformers model to correct the generated questions at the grammatical level. The experimental results show that our model can generate a certain amount of effective Tibetan QA corpus, and the BLEU-2 value increases by 13.07% than baseline. Moreover, the speed of the model has been greatly improved.

Keywords