Automatic feature engineering for catalyst design using small data without prior knowledge of target catalysis

Toshiaki Taniike; Aya Fujiwara; Sunao Nakanowatari; Fernando García-Escobar; Keisuke Takahashi

doi:10.1038/s42004-023-01086-y

Communications Chemistry (Jan 2024)

Automatic feature engineering for catalyst design using small data without prior knowledge of target catalysis

Toshiaki Taniike,
Aya Fujiwara,
Sunao Nakanowatari,
Fernando García-Escobar,
Keisuke Takahashi

Affiliations

Toshiaki Taniike: Graduate School of Advanced Science and Technology, Japan Advanced Institute of Science and Technology
Aya Fujiwara: Graduate School of Advanced Science and Technology, Japan Advanced Institute of Science and Technology
Sunao Nakanowatari: Graduate School of Advanced Science and Technology, Japan Advanced Institute of Science and Technology
Fernando García-Escobar: Department of Chemistry, Hokkaido University
Keisuke Takahashi: Department of Chemistry, Hokkaido University

DOI: https://doi.org/10.1038/s42004-023-01086-y
Journal volume & issue: Vol. 7, no. 1
pp. 1 – 8

Abstract

Read online

Abstract The empirical aspect of descriptor design in catalyst informatics, particularly when confronted with limited data, necessitates adequate prior knowledge for delving into unknown territories, thus presenting a logical contradiction. This study introduces a technique for automatic feature engineering (AFE) that works on small catalyst datasets, without reliance on specific assumptions or pre-existing knowledge about the target catalysis when designing descriptors and building machine-learning models. This technique generates numerous features through mathematical operations on general physicochemical features of catalytic components and extracts relevant features for the desired catalysis, essentially screening numerous hypotheses on a machine. AFE yields reasonable regression results for three types of heterogeneous catalysis: oxidative coupling of methane (OCM), conversion of ethanol to butadiene, and three-way catalysis, where only the training set is swapped. Moreover, through the application of active learning that combines AFE and high-throughput experimentation for OCM, we successfully visualize the machine’s process of acquiring precise recognition of the catalyst design. Thus, AFE is a versatile technique for data-driven catalysis research and a key step towards fully automated catalyst discoveries.

Published in Communications Chemistry

ISSN: 2399-3669 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Science: Chemistry
Website: https://www.nature.com/commschem

About the journal