Complexity (Jan 2021)

Using the Ship-Gram Model for Japanese Keyword Extraction Based on News Reports

  • Miao Teng

DOI
https://doi.org/10.1155/2021/9965843
Journal volume & issue
Vol. 2021

Abstract

Read online

In this paper, we conduct an in-depth study of Japanese keyword extraction from news reports, train external computer document word sets from text preprocessing into word vectors using the Ship-gram model in the deep learning tool Word2Vec, and calculate the cosine distance between word vectors. In this paper, the sliding window in TextRank is designed to connect internal document information to improve the in-text semantic coherence. The main idea is to use not only the statistical and structural features of words but also the semantic features of words extracted through word-embedding techniques, i.e., multifeature fusion, to obtain the importance weights of words themselves and the attraction weights between words and then iteratively calculate the final weight of each word through the graph model algorithm to determine the extracted keywords. To verify the performance of the algorithm, extensive simulation experimental studies were conducted on three different types of datasets. The experimental results show that the proposed keyword extraction algorithm can improve the performance by a maximum of 6.45% and 20.36% compared with the existing word frequency statistics and graph model methods, respectively; MF-Rank can achieve a maximum performance improvement of 1.76% compared with PW-TF.