Jisuanji kexue (Jun 2022)

Automatic Summarization Model Combining BERT Word Embedding Representation and Topic Information Enhancement

  • GUO Yu-xin, CHEN Xiu-hong

DOI
https://doi.org/10.11896/jsjkx.210400101
Journal volume & issue
Vol. 49, no. 6
pp. 313 – 318

Abstract

Read online

Automatic text summarization can help people to filter and identify information quickly,grasp the key content of news,and alleviate the problem of information overload.The mainstream abstractive summarization model is mainly based on the encoder-decoder architecture.In view of the fact that the decoder does not fully consider the text topic information when predicting the target word,and the traditional Word2Vec static word vector cannot solve the polysemy problem,an automatic summarization model for Chinese short news is proposed,which integrates the BERT word embedding representation and topic information enhancing.The encoder combines unsupervised algorithm to obtain text topic information and integrates it into the attention mechanism to improve the decoding effect of the model.At the decoder side,the BERT sentence vector extracted from the BERT pre-trained language model is used as the supplementary feature to obtain more semantic information.Meanwhile,pointer mechanism is introduced to solve the problem of out of vocabulary,and coverage mechanism is used to suppress repetition effectively.Finally,in the training process,reinforcement learning method is adopted to optimize the model for non-differentiable index ROUGE to avoid exposing bias.Experimental results on two datasets of Chinese short news summarization show that the proposed model can significantly improve the ROUGE evaluation index,effectively integrate text topic information,and generate fluent and concise summaries.

Keywords