Patterns (Nov 2023)
Optimal decision-making in high-throughput virtual screening pipelines
Abstract
Summary: The need for efficient computational screening of molecular candidates that possess desired properties frequently arises in various scientific and engineering problems, including drug discovery and materials design. However, the enormous search space containing the candidates and the substantial computational cost of high-fidelity property prediction models make screening practically challenging. In this work, we propose a general framework for constructing and optimizing a high-throughput virtual screening (HTVS) pipeline that consists of multi-fidelity models. The central idea is to optimally allocate the computational resources to models with varying costs and accuracy to optimize the return on computational investment. Based on both simulated and real-world data, we demonstrate that the proposed optimal HTVS framework can significantly accelerate virtual screening without any degradation in terms of accuracy. Furthermore, it enables an adaptive operational strategy for HTVS, where one can trade accuracy for efficiency. The bigger picture: Screening large pools of molecular candidates to identify those with specific design criteria or targeted properties is demanding in various science and engineering domains. While a high-throughput virtual screening (HTVS) pipeline can provide efficient means to achieving this goal, its design and operation often rely on experts’ intuition, potentially resulting in suboptimal performance. In this paper, we fill this critical gap by presenting a systematic framework that can maximize the return on computational investment (ROCI) of such HTVS campaigns. Based on various scenarios, we empirically validate the proposed framework and demonstrate its potential to accelerate scientific discoveries through optimal computational campaigns, especially in the context of virtual screening.