Complexity (Jan 2021)
An Improved PageRank Algorithm Based on Text Similarity Approach for Critical Standards Identification in Complex Standard Citation Networks
Abstract
A standard system, which is a powerful tool in maintaining the normal operations and development of a specific industry, is intrinsically a complex network composed of numerous standards which coordinate and interact with each other. In a networked standard system, the identification of critical standards is of great significance when drafting and revising standards. However, a majority of the existing literature has focused on the citation relationships between standards while ignoring the intrinsic interdependent relationships between the contents of standards. To overcome this limitation, we utilize the text similarity approach (TSA) to quantify the relationship intensity between each pair of standards, in order to generate a directed weighted network. The critical contribution of this study is that the similarity computed by the TSA is incorporated into the traditional PageRank algorithm for the identification of critical standards. The improved algorithm comprehensively considers the quantity and importance of neighboring standards and the citation intensity, as quantified by TSA. The algorithm is finally validated using the Chinese environmental health standards through comparison with the traditional PageRank algorithm and different classic measurements.