Xi'an Gongcheng Daxue xuebao (Apr 2022)

Public opinion data collection of power network using topic crawler

  • XI Zenghui,
  • WANG Weibin,
  • LU Jiaming,
  • QU Haini

DOI
https://doi.org/10.13338/j.issn.1674-649x.2022.02.010
Journal volume & issue
Vol. 36, no. 2
pp. 72 – 78

Abstract

Read online

The traditional public opinion data collection methods of power network have some problems, such as low recall rate, low calculation accuracy and being time-consuming. Therefore, the topic crawler technology was used to improve the data collection method. Firstly, the topic crawler technology was used to build the data collection framework, and based on the framework, the topic vector of network public opinion was constructed. Secondly, we defined the topic and keyword of network public opinion, and calculated the similarity between keyword vector and power web page by using the similarity model, which was added to the web crawler queue. Finally, used the best first search strategy, setted the web page with the highest similarity as the first priority, downloaded and stored network public opinion related data, completed data crawling and realized data collection. The experimental results show that the average recall rate of the method in this paper is as high as 92%, the accuracy of web page similarity calculation is higher than 90%, and the average time of data acquisition is 36 minutes, which is better than the comparison method.

Keywords