Communications Chemistry (Jun 2024)

Unsupervised manifold embedding to encode molecular quantum information for supervised learning of chemical data

  • Tonglei Li,
  • Nicholas J. Huls,
  • Shan Lu,
  • Peng Hou

DOI
https://doi.org/10.1038/s42004-024-01217-z
Journal volume & issue
Vol. 7, no. 1
pp. 1 – 16

Abstract

Read online

Abstract Molecular representation is critical in chemical machine learning. It governs the complexity of model development and the fulfillment of training data to avoid either over- or under-fitting. As electronic structures and associated attributes are the root cause for molecular interactions and their manifested properties, we have sought to examine the local electron information on a molecular manifold to understand and predict molecular interactions. Our efforts led to the development of a lower-dimensional representation of a molecular manifold, Manifold Embedding of Molecular Surface (MEMS), to embody surface electronic quantities. By treating a molecular surface as a manifold and computing its embeddings, the embedded electronic attributes retain the chemical intuition of molecular interactions. MEMS can be further featurized as input for chemical learning. Our solubility prediction with MEMS demonstrated the feasibility of both shallow and deep learning by neural networks, suggesting that MEMS is expressive and robust against dimensionality reduction.