Nongye tushu qingbao xuebao (Jun 2023)

Think-Tank's Text Summarization Based on Combined Keywords and Contrastive Learning Training

  • CHEN Yuanyuan, WANG Lei

DOI
https://doi.org/10.13998/j.cnki.issn1002-1248.23-0419
Journal volume & issue
Vol. 35, no. 6
pp. 72 – 82

Abstract

Read online

[Purpose/Significance] Think tank reports are professional analysis and policy recommendations provided by independent research institutions, which provide decision support and an important tool for policy makers and the public to promote social progress. The purpose of think tank report summary is to provide readers with a concise and clear overview, so that they can quickly understand the main content and conclusion of the report, so as to improve the efficiency of information screening, dissemination effect and knowledge transfer. At present, there are many differences in the think tank reports, which leads to inaccurate summaries. It is urgent to improve the existing text summarization methods. This paper focuses on the characteristics of think tank reports in the context of multi-topic text summarization technology. [Method/Process] Aiming at the problem that the existing models have poor effect on the summarization of think tank reports, not only the crawler technology was used to construct a think tank report dataset, but also a report summarization method was proposed using the "combined keywords" search method.. First, a keyword extraction algorithm was used to extract the keyword information in the original text. Second, a "combined keywords" search module based on cross-attention mechanism was used to improve the model's ability to capture the topic information in the text and help improve the accuracy of the summary generated by the model. Finally, in order to avoid excessive attention to keywords while ignoring the overall information of a think tank report, a contrastive learning training method was designed in the training process. [Results/Conclusions] The experimental results show that the Rouge-1, Rouge-2 and Rouge-L values of the think tank report summarization model reached 48.23, 32.55 and 42.50, respectively. The summarization model with the "combined keywords" search method proposed in this study can effectively solve the problem of inaccurate summarization caused by multi-topic texts, and the text summarization effect of the model in the field of think tank reports is better than other similar models. In addition, ablation experiments were used to prove the effectiveness of the "combined keywords" search module and contrastive learning training. There are still some shortcomings of this paper. For example, this study does not explore the location and frequency information of keywords. In addition, we will adjust the weight of keywords according to their position, frequency and importance in the text, and further expand the think tank report summary dataset.

Keywords