IEEE Access (Jan 2021)

Prediction of Blood-Brain Barrier Permeability of Compounds by Fusing Resampling Strategies and eXtreme Gradient Boosting

  • Zhiwen Shi,
  • Yanyi Chu,
  • Yonghong Zhang,
  • Yanjing Wang,
  • Dong-Qing Wei

DOI
https://doi.org/10.1109/ACCESS.2020.3047852
Journal volume & issue
Vol. 9
pp. 9557 – 9566

Abstract

Read online

Computer-aided drug design is an efficient method to analyze the development of disease-related drugs. However, developed as binding targets, medicines perform well in cell models and animal models but fail in human models. One main reason for this failure is that the human body has natural barriers, such as the blood-brain barrier, to block exogenous macromolecules. Thus, efficient and accurate predictions of drug molecules that can effectively pass the blood-brain barrier is necessary in developing drug treatments for brain tissue diseases. In this study, 7658 molecular structure features were extracted from 2354 drug molecule SMILE strings using computational methods. By integrating three feature selection algorithms of machine learning, 33 chemical structure features with significantly discriminant performance were screened out and used to construct multiple discriminant models. After a comprehensive comparison, the XGBoost model was selected as the final prediction model. After data preprocessing and parameter optimization, the model achieved 95% accuracy on the training set. To verify the model's stability, we introduced an external data set, which reached 96% accuracy of the model. This study applies new resampling methods and machine learning algorithms, and adjusts the application of resampling methods to obtain new chemical features to construct machine learning predictors. The features may contribute to the significant drug development that integrates biological analysis and machine learning algorithms.

Keywords