Comprehensive strategies of machine-learning-based quantitative structure-activity relationship models
Jiashun Mao,
Javed Akhtar,
Xiao Zhang,
Liang Sun,
Shenghui Guan,
Xinyu Li,
Guangming Chen,
Jiaxin Liu,
Hyeon-Nae Jeon,
Min Sung Kim,
Kyoung Tai No,
Guanyu Wang
Affiliations
Jiashun Mao
The Interdisciplinary Graduate Program in Integrative Biotechnology and Translational Medicine, Yonsei University, Incheon 21983, Republic of Korea; Department of Biology, School of Life Sciences, Southern University of Science and Technology, 1088 Xueyuan Avenue, Shenzhen, Guangdong 518055, China; Guangdong Provincial Key Laboratory of Computational Science and Material Design, Shenzhen, Guangdong 518055 China
Javed Akhtar
Department of Biology, School of Life Sciences, Southern University of Science and Technology, 1088 Xueyuan Avenue, Shenzhen, Guangdong 518055, China; Guangdong Provincial Key Laboratory of Cell Microenvironment and Disease Research, Shenzhen, Guangdong 518055, China
Xiao Zhang
Shanghai Rural Commercial Bank Co., Ltd, Shanghai 200002, China
Liang Sun
Department of Physics, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong, China
Shenghui Guan
Department of Biology, School of Life Sciences, Southern University of Science and Technology, 1088 Xueyuan Avenue, Shenzhen, Guangdong 518055, China; Guangdong Provincial Key Laboratory of Computational Science and Material Design, Shenzhen, Guangdong 518055 China
Xinyu Li
School of Life and Health Sciences and Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen 518172, China
Guangming Chen
Department of Biology, School of Life Sciences, Southern University of Science and Technology, 1088 Xueyuan Avenue, Shenzhen, Guangdong 518055, China; Guangdong Provincial Key Laboratory of Cell Microenvironment and Disease Research, Shenzhen, Guangdong 518055, China
Jiaxin Liu
Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 03722, Republic of Korea
Hyeon-Nae Jeon
Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 03722, Republic of Korea
Min Sung Kim
Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 03722, Republic of Korea
Kyoung Tai No
The Interdisciplinary Graduate Program in Integrative Biotechnology and Translational Medicine, Yonsei University, Incheon 21983, Republic of Korea; Corresponding author
Guanyu Wang
Department of Biology, School of Life Sciences, Southern University of Science and Technology, 1088 Xueyuan Avenue, Shenzhen, Guangdong 518055, China; Guangdong Provincial Key Laboratory of Computational Science and Material Design, Shenzhen, Guangdong 518055 China; Guangdong Provincial Key Laboratory of Cell Microenvironment and Disease Research, Shenzhen, Guangdong 518055, China; Corresponding author
Summary: Early quantitative structure-activity relationship (QSAR) technologies have unsatisfactory versatility and accuracy in fields such as drug discovery because they are based on traditional machine learning and interpretive expert features. The development of Big Data and deep learning technologies significantly improve the processing of unstructured data and unleash the great potential of QSAR. Here we discuss the integration of wet experiments (which provide experimental data and reliable verification), molecular dynamics simulation (which provides mechanistic interpretation at the atomic/molecular levels), and machine learning (including deep learning) techniques to improve QSAR models. We first review the history of traditional QSAR and point out its problems. We then propose a better QSAR model characterized by a new iterative framework to integrate machine learning with disparate data input. Finally, we discuss the application of QSAR and machine learning to many practical research fields, including drug development and clinical trials.