npj Computational Materials (May 2024)

Enhancing deep learning predictive models with HAPPY (Hierarchically Abstracted rePeat unit of PolYmers) representation

  • Jihun Ahn,
  • Gabriella Pasya Irianti,
  • Yeojin Choe,
  • Su-Mi Hur

DOI
https://doi.org/10.1038/s41524-024-01293-8
Journal volume & issue
Vol. 10, no. 1
pp. 1 – 11

Abstract

Read online

Abstract We introduce HAPPY (Hierarchically Abstracted rePeat unit of PolYmers), a string representation for polymers, designed to efficiently encapsulate essential polymer structure features for property prediction. HAPPY assigns single constituent elements to groups of sub-structures and employs grammatically complete and independent connectors between chemical linkages. Using a limited number of datapoints, we trained neural networks utilizing both HAPPY and conventional SMILES encoding of repeated unit structures and compared their performance in predicting five polymer properties: dielectric constant, glass transition temperature, thermal conductivity, solubility, and density. The results showed that the HAPPY-based network could achieve higher prediction R-squared score and two-fold faster training times. We further tested the robustness and versatility of HAPPY-based network with an augmented training dataset. Additionally, we present topo-HAPPY (Topological HAPPY), an extension that incorporates topological details of the constituent connectivity, leading to improved solubility and glass transition temperature prediction R-squared score.