LogD7.4 prediction enhanced by transferring knowledge from chromatographic retention time, microscopic pKa and logP

Yitian Wang; Jiacheng Xiong; Fu Xiao; Wei Zhang; Kaiyang Cheng; Jingxin Rao; Buying Niu; Xiaochu Tong; Ning Qu; Runze Zhang; Dingyan Wang; Kaixian Chen; Xutong Li; Mingyue Zheng

doi:10.1186/s13321-023-00754-4

Journal of Cheminformatics (Sep 2023)

LogD7.4 prediction enhanced by transferring knowledge from chromatographic retention time, microscopic pKa and logP

Yitian Wang,
Jiacheng Xiong,
Fu Xiao,
Wei Zhang,
Kaiyang Cheng,
Jingxin Rao,
Buying Niu,
Xiaochu Tong,
Ning Qu,
Runze Zhang,
Dingyan Wang,
Kaixian Chen,
Xutong Li,
Mingyue Zheng

Affiliations

Yitian Wang: Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences
Jiacheng Xiong: Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences
Fu Xiao: Nanjing University of Chinese Medicine
Wei Zhang: Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences
Kaiyang Cheng: Nanjing University of Chinese Medicine
Jingxin Rao: Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences
Buying Niu: Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences
Xiaochu Tong: Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences
Ning Qu: Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences
Runze Zhang: Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences
Dingyan Wang: Lingang Laboratory
Kaixian Chen: Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences
Xutong Li: Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences
Mingyue Zheng: Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences

DOI: https://doi.org/10.1186/s13321-023-00754-4
Journal volume & issue: Vol. 15, no. 1
pp. 1 – 13

Abstract

Read online

Abstract Lipophilicity is a fundamental physical property that significantly affects various aspects of drug behavior, including solubility, permeability, metabolism, distribution, protein binding, and toxicity. Accurate prediction of lipophilicity, measured by the logD7.4 value (the distribution coefficient between n-octanol and buffer at physiological pH 7.4), is crucial for successful drug discovery and design. However, the limited availability of data for logD modeling poses a significant challenge to achieving satisfactory generalization capability. To address this challenge, we have developed a novel logD7.4 prediction model called RTlogD, which leverages knowledge from multiple sources. RTlogD combines pre-training on a chromatographic retention time (RT) dataset since the RT is influenced by lipophilicity. Additionally, microscopic pKa values are incorporated as atomic features, providing valuable insights into ionizable sites and ionization capacity. Furthermore, logP is integrated as an auxiliary task within a multitask learning framework. We conducted ablation studies and presented a detailed analysis, showcasing the effectiveness and interpretability of RT, pKa, and logP in the RTlogD model. Notably, our RTlogD model demonstrated superior performance compared to commonly used algorithms and prediction tools. These results underscore the potential of the RTlogD model to improve the accuracy and generalization of logD prediction in drug discovery and design. In summary, the RTlogD model addresses the challenge of limited data availability in logD modeling by leveraging knowledge from RT, microscopic pKa, and logP. Incorporating these factors enhances the predictive capabilities of our model, and it holds promise for real-world applications in drug discovery and design scenarios. Graphical Abstract

Published in Journal of Cheminformatics

ISSN: 1758-2946 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Science: Chemistry
Website: https://jcheminf.biomedcentral.com/

About the journal

Abstract

Keywords