Scientific Data (Jun 2024)
A materials terminology knowledge graph automatically constructed from text corpus
Abstract
Abstract A scalable, reusable, and broad-coverage unified material knowledge representation shows its importance and will bring great benefits to data sharing among materials communities. A knowledge graph (KG) for materials terminology, which is a formal collection of term entities and relationships, is conceptually important to achieve this goal. In this work, we propose a KG for materials terminology, named Materials Genome Engineering Database Knowledge Graph (MGED-KG), which is automatically constructed from text corpus via natural language processing. MGED-KG is the most comprehensive KG for materials terminology in both Chinese and English languages, consisting of 8,660 terms and their explanations. It encompasses 11 principal categories, such as Metals, Composites, Nanomaterials, each with two or three levels of subcategories, resulting in a total of 235 distinct category labels. For further application, a knowledge web system based on MGED-KG is developed and shows its great power in improving data sharing efficiency from the aspects of query expansion, term, and data recommendation.