IEEE Access (Jan 2025)
Knowledge Graph as Pre-Training Corpus for Structural Reasoning via Multi-Hop Linearization
Abstract
Large language models have demonstrated exceptional performance across various natural language processing tasks. However, their reliance on unstructured text corpora for pre-training limits their effectiveness in tasks requiring structured reasoning such as multi-hop question-answering. Knowledge Graphs provide a rich, structured source of relational data, offering an opportunity to enhance the reasoning capabilities of Large language models. In this paper, we propose a novel framework, Knowledge Graph as Pre-training Corpus (KGPC), which transforms knowledge graphs into text using a multi-hop linearization process. Unlike existing approaches that linearize singular triples, our method captures the interconnected nature of knowledge graphs by linking multiple triples across multiple hops, preserving their relational structure during the pre-training phase. This structured knowledge injection improves language models to perform complex reasoning tasks. We evaluate our approach on multi-hop reasoning benchmarks, demonstrating significant performance gains over existing models, particularly in question-answering tasks. Our results highlight the potential of multi-hop linearization in enhancing the structural reasoning capacity of language models, reducing error propagation, and improving the integration of structured knowledge into language models.
Keywords