MaterialBERT for natural language processing of materials science texts

Michiko Yoshitake; Fumitaka Sato; Hiroyuki Kawano; Hiroshi Teraoka

doi:10.1080/27660400.2022.2124831

Science and Technology of Advanced Materials: Methods (Dec 2022)

MaterialBERT for natural language processing of materials science texts

Michiko Yoshitake,
Fumitaka Sato,
Hiroyuki Kawano,
Hiroshi Teraoka

Affiliations

Michiko Yoshitake: National Institute for Material Science
Fumitaka Sato: National Institute for Material Science
Hiroyuki Kawano: National Institute for Material Science
Hiroshi Teraoka: National Institute for Material Science

DOI: https://doi.org/10.1080/27660400.2022.2124831
Journal volume & issue: Vol. 2, no. 1
pp. 372 – 380

Abstract

Read online

A BERT (Bidirectional Encoder Representations from Transformers) model, which we named “MaterialBERT”, has been generated using scientific papers in wide area of material science as a corpus. A new vocabulary list for tokenizer was generated using material science corpus. Two BERT models with different vocabulary lists for the tokenizer, one with the original one made by Google and the other newly made by the authors, were generated. Word vectors embedded during the pre-training with the two MaterialBERT models reasonably reflect the meanings of materials names in material-class clustering and in the relationship between base materials and their compounds or derivatives for not only inorganic materials but also organic materials and organometallic compounds. Fine-tuning with CoLA (The Corpus of Linguistic Acceptability) using the pre-trained MaterialBERT showed a higher score than the original BERT. The two MaterialBERTs could be also utilized as a starting point for transfer learning of a narrower domain-specific BERT.

Published in Science and Technology of Advanced Materials: Methods

ISSN: 2766-0400 (Online)
Publisher: Taylor & Francis Group
Country of publisher: United Kingdom
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Materials of engineering and construction. Mechanics of materials
Website: https://www.tandfonline.com/journals/tstm

About the journal

Abstract

Keywords