Consensus holistic virtual screening for drug discovery: a novel machine learning model approach

Said Moshawih; Zhen Hui Bu; Hui Poh Goh; Nurolaini Kifli; Lam Hong Lee; Khang Wen Goh; Long Chiau Ming

doi:10.1186/s13321-024-00855-8

Journal of Cheminformatics (May 2024)

Consensus holistic virtual screening for drug discovery: a novel machine learning model approach

Said Moshawih,
Zhen Hui Bu,
Hui Poh Goh,
Nurolaini Kifli,
Lam Hong Lee,
Khang Wen Goh,
Long Chiau Ming

Affiliations

Said Moshawih: PAPRSB Institute of Health Sciences, Universiti Brunei Darussalam
Zhen Hui Bu: Faculty of Computing and Engineering, Quest International University
Hui Poh Goh: PAPRSB Institute of Health Sciences, Universiti Brunei Darussalam
Nurolaini Kifli: PAPRSB Institute of Health Sciences, Universiti Brunei Darussalam
Lam Hong Lee: Faculty of Computing and Engineering, Quest International University
Khang Wen Goh: Faculty of Data Science and Information Technology, INTI International University
Long Chiau Ming: PAPRSB Institute of Health Sciences, Universiti Brunei Darussalam

DOI: https://doi.org/10.1186/s13321-024-00855-8
Journal volume & issue: Vol. 16, no. 1
pp. 1 – 27

Abstract

Read online

Abstract In drug discovery, virtual screening is crucial for identifying potential hit compounds. This study aims to present a novel pipeline that employs machine learning models that amalgamates various conventional screening methods. A diverse array of protein targets was selected, and their corresponding datasets were subjected to active/decoy distribution analysis prior to scoring using four distinct methods: QSAR, Pharmacophore, docking, and 2D shape similarity, which were ultimately integrated into a single consensus score. The fine-tuned machine learning models were ranked using the novel formula “w_new”, consensus scores were calculated, and an enrichment study was performed for each target. Distinctively, consensus scoring outperformed other methods in specific protein targets such as PPARG and DPP4, achieving AUC values of 0.90 and 0.84, respectively. Remarkably, this approach consistently prioritized compounds with higher experimental PIC50 values compared to all other screening methodologies. Moreover, the models demonstrated a range of moderate to high performance in terms of R2 values during external validation. In conclusion, this novel workflow consistently delivered superior results, emphasizing the significance of a holistic approach in drug discovery, where both quantitative metrics and active enrichment play pivotal roles in identifying the best virtual screening methodology. Scientific contribution We presented a novel consensus scoring workflow in virtual screening, merging diverse methods for enhanced compound selection. We also introduced ‘w_new’, a groundbreaking metric that intricately refines machine learning model rankings by weighing various model-specific parameters, revolutionizing their efficacy in drug discovery in addition to other domains. Graphical Abstract

Published in Journal of Cheminformatics

ISSN: 1758-2946 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Science: Chemistry
Website: https://jcheminf.biomedcentral.com/

About the journal

Abstract

Keywords