Scientific Data (Jun 2024)

A materials terminology knowledge graph automatically constructed from text corpus

  • Yuwei Zhang,
  • Fangyi Chen,
  • Zeyi Liu,
  • Yunzhuo Ju,
  • Dongliang Cui,
  • Jinyi Zhu,
  • Xue Jiang,
  • Xi Guo,
  • Jie He,
  • Lei Zhang,
  • Xiaotong Zhang,
  • Yanjing Su

DOI
https://doi.org/10.1038/s41597-024-03448-0
Journal volume & issue
Vol. 11, no. 1
pp. 1 – 11

Abstract

Read online

Abstract A scalable, reusable, and broad-coverage unified material knowledge representation shows its importance and will bring great benefits to data sharing among materials communities. A knowledge graph (KG) for materials terminology, which is a formal collection of term entities and relationships, is conceptually important to achieve this goal. In this work, we propose a KG for materials terminology, named Materials Genome Engineering Database Knowledge Graph (MGED-KG), which is automatically constructed from text corpus via natural language processing. MGED-KG is the most comprehensive KG for materials terminology in both Chinese and English languages, consisting of 8,660 terms and their explanations. It encompasses 11 principal categories, such as Metals, Composites, Nanomaterials, each with two or three levels of subcategories, resulting in a total of 235 distinct category labels. For further application, a knowledge web system based on MGED-KG is developed and shows its great power in improving data sharing efficiency from the aspects of query expansion, term, and data recommendation.