Variational autoencoder-based chemical latent space for large molecular structures with 3D complexity

Toshiki Ochiai; Tensei Inukai; Manato Akiyama; Kairi Furui; Masahito Ohue; Nobuaki Matsumori; Shinsuke Inuki; Motonari Uesugi; Toshiaki Sunazuka; Kazuya Kikuchi; Hideaki Kakeya; Yasubumi Sakakibara

doi:10.1038/s42004-023-01054-6

Communications Chemistry (Nov 2023)

Variational autoencoder-based chemical latent space for large molecular structures with 3D complexity

Toshiki Ochiai,
Tensei Inukai,
Manato Akiyama,
Kairi Furui,
Masahito Ohue,
Nobuaki Matsumori,
Shinsuke Inuki,
Motonari Uesugi,
Toshiaki Sunazuka,
Kazuya Kikuchi,
Hideaki Kakeya,
Yasubumi Sakakibara

Affiliations

Toshiki Ochiai: Department of Biosciences and Informatics, Keio University
Tensei Inukai: Department of Biosciences and Informatics, Keio University
Manato Akiyama: Department of Biosciences and Informatics, Keio University
Kairi Furui: Department of Computer Science, School of Computing, Tokyo Institute of Technology
Masahito Ohue: Department of Computer Science, School of Computing, Tokyo Institute of Technology
Nobuaki Matsumori: Department of Chemistry, Graduate School of Science, Kyushu University, Fukuoka
Shinsuke Inuki: Division of Medicinal Frontier Sciences, Graduate School of Pharmaceutical Sciences, Kyoto University, Kyoto
Motonari Uesugi: Institute for Chemical Research and WPI-iCeMS, Kyoto University
Toshiaki Sunazuka: Omura Satoshi Memorial Institute and Graduate School of Infection Control Sciences, Kitasato University
Kazuya Kikuchi: Department of Applied Chemistry, Graduate School of Engineering, Osaka University
Hideaki Kakeya: Division of Medicinal Frontier Sciences, Graduate School of Pharmaceutical Sciences, Kyoto University, Kyoto
Yasubumi Sakakibara: Department of Biosciences and Informatics, Keio University

DOI: https://doi.org/10.1038/s42004-023-01054-6
Journal volume & issue: Vol. 6, no. 1
pp. 1 – 14

Abstract

Read online

Abstract The structural diversity of chemical libraries, which are systematic collections of compounds that have potential to bind to biomolecules, can be represented by chemical latent space. A chemical latent space is a projection of a compound structure into a mathematical space based on several molecular features, and it can express structural diversity within a compound library in order to explore a broader chemical space and generate novel compound structures for drug candidates. In this study, we developed a deep-learning method, called NP-VAE (Natural Product-oriented Variational Autoencoder), based on variational autoencoder for managing hard-to-analyze datasets from DrugBank and large molecular structures such as natural compounds with chirality, an essential factor in the 3D complexity of compounds. NP-VAE was successful in constructing the chemical latent space from large-sized compounds that were unable to be handled in existing methods, achieving higher reconstruction accuracy, and demonstrating stable performance as a generative model across various indices. Furthermore, by exploring the acquired latent space, we succeeded in comprehensively analyzing a compound library containing natural compounds and generating novel compound structures with optimized functions.

Published in Communications Chemistry

ISSN: 2399-3669 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Science: Chemistry
Website: https://www.nature.com/commschem

About the journal