Enhancing deep learning predictive models with HAPPY (Hierarchically Abstracted rePeat unit of PolYmers) representation

Jihun Ahn; Gabriella Pasya Irianti; Yeojin Choe; Su-Mi Hur

doi:10.1038/s41524-024-01293-8

npj Computational Materials (May 2024)

Enhancing deep learning predictive models with HAPPY (Hierarchically Abstracted rePeat unit of PolYmers) representation

Jihun Ahn,
Gabriella Pasya Irianti,
Yeojin Choe,
Su-Mi Hur

Affiliations

Jihun Ahn: Department of Polymer Engineering, Graduate School, Chonnam National University
Gabriella Pasya Irianti: Department of Polymer Engineering, Graduate School, Chonnam National University
Yeojin Choe: Department of Polymer Engineering, Graduate School, Chonnam National University
Su-Mi Hur: Department of Polymer Engineering, Graduate School, Chonnam National University

DOI: https://doi.org/10.1038/s41524-024-01293-8
Journal volume & issue: Vol. 10, no. 1
pp. 1 – 11

Abstract

Read online

Abstract We introduce HAPPY (Hierarchically Abstracted rePeat unit of PolYmers), a string representation for polymers, designed to efficiently encapsulate essential polymer structure features for property prediction. HAPPY assigns single constituent elements to groups of sub-structures and employs grammatically complete and independent connectors between chemical linkages. Using a limited number of datapoints, we trained neural networks utilizing both HAPPY and conventional SMILES encoding of repeated unit structures and compared their performance in predicting five polymer properties: dielectric constant, glass transition temperature, thermal conductivity, solubility, and density. The results showed that the HAPPY-based network could achieve higher prediction R-squared score and two-fold faster training times. We further tested the robustness and versatility of HAPPY-based network with an augmented training dataset. Additionally, we present topo-HAPPY (Topological HAPPY), an extension that incorporates topological details of the constituent connectivity, leading to improved solubility and glass transition temperature prediction R-squared score.

Published in npj Computational Materials

ISSN: 2057-3960 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Materials of engineering and construction. Mechanics of materials; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://www.nature.com/npjcompumats/

About the journal