Communications Chemistry (Jan 2024)

Automatic feature engineering for catalyst design using small data without prior knowledge of target catalysis

  • Toshiaki Taniike,
  • Aya Fujiwara,
  • Sunao Nakanowatari,
  • Fernando García-Escobar,
  • Keisuke Takahashi

DOI
https://doi.org/10.1038/s42004-023-01086-y
Journal volume & issue
Vol. 7, no. 1
pp. 1 – 8

Abstract

Read online

Abstract The empirical aspect of descriptor design in catalyst informatics, particularly when confronted with limited data, necessitates adequate prior knowledge for delving into unknown territories, thus presenting a logical contradiction. This study introduces a technique for automatic feature engineering (AFE) that works on small catalyst datasets, without reliance on specific assumptions or pre-existing knowledge about the target catalysis when designing descriptors and building machine-learning models. This technique generates numerous features through mathematical operations on general physicochemical features of catalytic components and extracts relevant features for the desired catalysis, essentially screening numerous hypotheses on a machine. AFE yields reasonable regression results for three types of heterogeneous catalysis: oxidative coupling of methane (OCM), conversion of ethanol to butadiene, and three-way catalysis, where only the training set is swapped. Moreover, through the application of active learning that combines AFE and high-throughput experimentation for OCM, we successfully visualize the machine’s process of acquiring precise recognition of the catalyst design. Thus, AFE is a versatile technique for data-driven catalysis research and a key step towards fully automated catalyst discoveries.