大数据 (Jan 2024)
Language model-enhanced edge factor mining in citation network
Abstract
GNN (graph neural network) is adept at aggregating information from neighboring nodes in graph-structured data to learn node representations, showing immense potential in the field of citation network data mining. However, most existing GNN lack a deep exploration of the factors driving edge information, which limits a thorough understanding and interpretation of complex relationships between nodes. For instance, the citation relationships between different papers are often driven by a variety of research topics. Despite attempts to enrich node and edge feature representations by integrating LLM (large language model) with their strong textual understanding capabilities, these approaches have still not effectively sloved the problem of uncovering the underlying drivers of edge information. In light of this, an innovative framework was proposed—LFEM (language model-enhanced edge factor mining),aimed at enhancing the differentiation of edge relationship modeling in various GNN through a plug-in approach, exploring its application value in citation network link prediction scenarios. Coarse-grained factor mining extracted explicit category-related edge factors from citation network graphs containing documents by designing structured information prompts for LLM. Fine-grained factor mining used the K-Means clustering algorithm to capture more detailed semantic topic-level edge factors from graph textual data. To verify the effectiveness of the proposed strategy, experiments were conducted on three public datasets, and the results demonstrated a significant advantage of LFEM framework in improving the performance of various GNN models.