Automatic Summarization Model Combining BERT Word Embedding Representation and Topic Information Enhancement

GUO Yu-xin, CHEN Xiu-hong

doi:10.11896/jsjkx.210400101

Jisuanji kexue (Jun 2022)

Automatic Summarization Model Combining BERT Word Embedding Representation and Topic Information Enhancement

GUO Yu-xin, CHEN Xiu-hong

Affiliations

GUO Yu-xin, CHEN Xiu-hong: 1 School of Artificial Intelligence and Computer Science,Jiangnan University,Wuxi,Jiangsu 214122,China ;2 Jiangsu Key Laboratory of Media Design and Software Technology,Wuxi,Jiangsu 214122,China

DOI: https://doi.org/10.11896/jsjkx.210400101
Journal volume & issue: Vol. 49, no. 6
pp. 313 – 318

Abstract

Read online

Automatic text summarization can help people to filter and identify information quickly,grasp the key content of news,and alleviate the problem of information overload.The mainstream abstractive summarization model is mainly based on the encoder-decoder architecture.In view of the fact that the decoder does not fully consider the text topic information when predicting the target word,and the traditional Word2Vec static word vector cannot solve the polysemy problem,an automatic summarization model for Chinese short news is proposed,which integrates the BERT word embedding representation and topic information enhancing.The encoder combines unsupervised algorithm to obtain text topic information and integrates it into the attention mechanism to improve the decoding effect of the model.At the decoder side,the BERT sentence vector extracted from the BERT pre-trained language model is used as the supplementary feature to obtain more semantic information.Meanwhile,pointer mechanism is introduced to solve the problem of out of vocabulary,and coverage mechanism is used to suppress repetition effectively.Finally,in the training process,reinforcement learning method is adopted to optimize the model for non-differentiable index ROUGE to avoid exposing bias.Experimental results on two datasets of Chinese short news summarization show that the proposed model can significantly improve the ROUGE evaluation index,effectively integrate text topic information,and generate fluent and concise summaries.

abstractive summarization|bert|topic information|attention mechanism|reinforcement learning

Published in Jisuanji kexue

ISSN: 1002-137X (Print)
Publisher: Editorial office of Computer Science
Country of publisher: China
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software; Technology: Technology (General)
Website: http://www.jsjkx.com/CN/1002-137X/home.shtml

About the journal

Abstract

Keywords