A Temporal Knowledge Graph Generation Dataset Supervised Distantly by Large Language Models

Jun Zhu; Yan Fu; Junlin Zhou; Duanbing Chen

doi:10.1038/s41597-025-05062-0

Scientific Data (May 2025)

A Temporal Knowledge Graph Generation Dataset Supervised Distantly by Large Language Models

Jun Zhu,
Yan Fu,
Junlin Zhou,
Duanbing Chen

Affiliations

Jun Zhu: Big Data Research Center, University of Electronic Science and Technology of China
Yan Fu: Big Data Research Center, University of Electronic Science and Technology of China
Junlin Zhou: Big Data Research Center, University of Electronic Science and Technology of China
Duanbing Chen: Big Data Research Center, University of Electronic Science and Technology of China

DOI: https://doi.org/10.1038/s41597-025-05062-0
Journal volume & issue: Vol. 12, no. 1
pp. 1 – 14

Abstract

Read online

Abstract Knowledge graphs can be constructed by extracting triples from documents, which denotes document-level relation extraction. Each triple illustrates a fact composed of two entities and a relation. However, temporal information corresponding to these facts is ignored. Incorporating temporal information exhibits the temporal connections between facts. Constructing a temporal knowledge graph (TKG) from documents is relatively unexplored. To address this limitation, we built a new dataset for this task based on a document-level relation extraction dataset. We mine the combination relation patterns and construct temporal quadruples by combining facts and timestamps. Additionally, two large language models (LLMs) are adopted to generate quadruples for the rest of the triples without timestamps. Multiple filters and manual annotation are used to ensure the quality of the data. To evaluate the dataset, we propose an LLM-based framework for extracting relations with temporal information from documents. The framework transforms relation extraction to a seq-to-seq task and fine-tunes LLMs to predict the relation with timestamps between entities. Experiments show the performance of LLMs on the proposed dataset.

Published in Scientific Data

ISSN: 2052-4463 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Science
Website: https://www.nature.com/sdata/

About the journal