Science and Technology of Advanced Materials: Methods (Dec 2023)

Automatic knowledge acquisition from superconductivity information in literature

  • Kento Mitsui,
  • Yutaka Sasaki,
  • Ryoji Asahi

DOI
https://doi.org/10.1080/27660400.2023.2206532
Journal volume & issue
Vol. 3, no. 1

Abstract

Read online

In this study, we developed a natural language processing model for extracting information solely from the abstracts of literature on superconducting materials, with the aim of making predictions for materials science. Using a dataset of tagged documents (annotations) on superconductivity, the DyGIE++ framework was employed for the simultaneous extraction of the named entities, relations, and events. Additionally, a model was developed for classifying the subject material in the abstracts. After training with 1,000 annotated abstracts, the model extracted information, such as the material composition, superconducting transition temperature, doping information, and process information, automatically from 48,565 abstracts registered in the Scopus database since 1937. The numbers of extracted entries concerning superconducting materials and transition temperatures were 43,944 and 24,075, respectively, i.e. equivalent to the number of entries in the existing databases. Machine learning models were constructed to predict physical and chemical properties. For example, the superconducting transition temperatures were predicted for compositions, with a mean absolute error of 15 K. In addition, the doping information indicated that the superconducting transition temperature was correlated with the choice of dopant and doping site.

Keywords