Scientific Data (May 2025)

A Temporal Knowledge Graph Generation Dataset Supervised Distantly by Large Language Models

  • Jun Zhu,
  • Yan Fu,
  • Junlin Zhou,
  • Duanbing Chen

DOI
https://doi.org/10.1038/s41597-025-05062-0
Journal volume & issue
Vol. 12, no. 1
pp. 1 – 14

Abstract

Read online

Abstract Knowledge graphs can be constructed by extracting triples from documents, which denotes document-level relation extraction. Each triple illustrates a fact composed of two entities and a relation. However, temporal information corresponding to these facts is ignored. Incorporating temporal information exhibits the temporal connections between facts. Constructing a temporal knowledge graph (TKG) from documents is relatively unexplored. To address this limitation, we built a new dataset for this task based on a document-level relation extraction dataset. We mine the combination relation patterns and construct temporal quadruples by combining facts and timestamps. Additionally, two large language models (LLMs) are adopted to generate quadruples for the rest of the triples without timestamps. Multiple filters and manual annotation are used to ensure the quality of the data. To evaluate the dataset, we propose an LLM-based framework for extracting relations with temporal information from documents. The framework transforms relation extraction to a seq-to-seq task and fine-tunes LLMs to predict the relation with timestamps between entities. Experiments show the performance of LLMs on the proposed dataset.