Journal of Hebei University of Science and Technology (Apr 2019)
Abstractive summarization model considering hybrid lexical features
Abstract
In order to use lexical features (including n-gram and part of speech information) to identify more key vocabulary content in the summarization generation process to further improve the quality of the summarization, an algorithm based on sequence-to-sequence (Seq2Seq) structure and attention mechanism and combining lexical features is proposed. The input layer of the algorithm combines the part of speech vector with the word vector, which is the input of the encoder layer. The encoder layer is composed of bi-directional LSTM, and the context vector is composed of the output of the encoder and the lexical feature vector extracted from the convolution neural network. The convolutional neural network layer in the model controls the lexical information, the bi-directional LSTM controls the sentence information, and the decoder layer uses unidirectional LSTM to decode the context vector and generates the summarization. The experiments on public dataset and the self-collected dataset show that the performance of the summarization generation model considering lexical feature is better than that of the contrast model. The ROUGE-1, ROUGE-2 and ROUGE-L scores on the public dataset are improved by 0.024, 0.033 and 0.030, respectively. Therefore, the generation of summarization is not only related to the semantics and themes of the article, but also to the lexical features.The proposed model provides a certain reference value in the research of generating summarization of integrating key infromation.
Keywords