IEEE Access (Jan 2022)

Extracting and Analyzing Inorganic Material Synthesis Procedures in the Literature

  • Kohei Makino,
  • Fusataka Kuniyoshi,
  • Jun Ozawa,
  • Makoto Miwa

DOI
https://doi.org/10.1109/ACCESS.2022.3160201
Journal volume & issue
Vol. 10
pp. 31524 – 31537

Abstract

Read online

Materials informatics requires large-scale collection and analysis of material synthesis procedures described in the literature for designing materials using computational methods. However, existing studies have not performed the paragraph-level analysis of the procedures. Moreover, since most of the synthesis procedures are described in natural language in articles and technical documents, it is necessary to structure them in a format that can be handled by computers through information extraction. Therefore, in this study, we construct a pipeline system that extracts synthesis procedures from text in the form of a flow graph and analyzes each procedure as a flow graph rather than a set of processes. The extraction system extracts entities by the deep learning model and relations between entities by the rule-based extractor from all paragraphs in the literature and selects procedures that include valid structures of entities and relations. Our evaluation of a benchmark dataset gave micro-averaged F-scores of 0.807, 0.830, and 0.609 for the entity extractor, relation extractor, and pipeline extractor, respectively. We applied this system to a large amount of literature and extracted approximately 90,000 flow graphs (procedures) containing approximately 4 million entities and 3 million relations. We performed several analyses, including taking statistics of the extracted graphs and checking frequent subgraphs for the extracted graphs. Commonly used methods in materials science were confirmed from our analyses; for example, ethanol is often dried by heating at 60 °C, and less-reactive noble gases are rarely included in the products. As a result, we experimentally confirmed that the extracted procedures were reasonable.

Keywords