Multimodal protein representation learning and target-aware variational auto-encoders for protein-binding ligand generation

Nhat Khang Ngo; Truong Son Hy

doi:10.1088/2632-2153/ad3ee4

Machine Learning: Science and Technology (Jan 2024)

Multimodal protein representation learning and target-aware variational auto-encoders for protein-binding ligand generation

Nhat Khang Ngo,
Truong Son Hy

Affiliations

Nhat Khang Ngo: AI Center, FPT Software , Hanoi, Vietnam
Truong Son Hy: ORCiD; AI Center, FPT Software , Hanoi, Vietnam; Department of Mathematics and Computer Science, Indiana State University , Terre Haute 47809, IN, United States of America

DOI: https://doi.org/10.1088/2632-2153/ad3ee4
Journal volume & issue: Vol. 5, no. 2
p. 025021

Abstract

Read online

Without knowledge of specific pockets, generating ligands based on the global structure of a protein target plays a crucial role in drug discovery as it helps reduce the search space for potential drug-like candidates in the pipeline. However, contemporary methods require optimizing tailored networks for each protein, which is arduous and costly. To address this issue, we introduce TargetVAE , a target-aware variational auto-encoder that generates ligands with desirable properties including high binding affinity and high synthesizability to arbitrary target proteins, guided by a multimodal deep neural network built based on geometric and sequence models, named Protein Multimodal Network (PMN), as the prior for the generative model. PMN unifies different representations of proteins (e.g. primary structure—sequence of amino acids, 3D tertiary structure, and residue-level graph) into a single representation. Our multimodal architecture learns from the entire protein structure and is able to capture their sequential, topological, and geometrical information by utilizing language modeling, graph neural networks, and geometric deep learning. We showcase the superiority of our approach by conducting extensive experiments and evaluations, including predicting protein-ligand binding affinity in the PBDBind v2020 dataset as well as the assessment of generative model quality, ligand generation for unseen targets, and docking score computation. Empirical results demonstrate the promising and competitive performance of our proposed approach. Our software package is publicly available at https://github.com/HySonLab/Ligand_Generation .

Published in Machine Learning: Science and Technology

ISSN: 2632-2153 (Online)
Publisher: IOP Publishing
Country of publisher: United Kingdom
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://iopscience.iop.org/journal/2632-2153

About the journal

Abstract

Keywords