Artificial Intelligence Chemistry (Jun 2024)
Drug discovery and development in the era of artificial intelligence: From machine learning to large language models
Abstract
Drug Research and Development (R&D) is a complex and difficult process, and current drug R&D faces the challenges of long time span, high investment, and high failure rate. Machine learning, with its powerful learning ability to characterize big data and complex networks, is increasingly effective to improve the efficiency and success rate of drug R&D. Here we review some recent examples of the application of machine learning methods in six areas: disease gene prediction, virtual screening, drug molecule generation, molecular attribute prediction, and prediction of drug combination synergism. We also discuss the advantages of integrative learning in multi-attribute prediction. Integrative models based on base learners constructed from data of different dimensions on the one hand fully utilize the information contained in these data, and on the other hand improve the average prediction performance. Finally, we envision a new paradigm for drug discovery and development: a large language model acts as a central hub to organize public resources into a knowledge base, validating the knowledge with computational software and smaller predictive models, as well as high-throughput automated screening platforms based on organoidal technologies, to speed up development and reduce the differences in efficacy between disease models and humans to improve the success rate of a drug.