Jisuanji kexue yu tansuo (Feb 2020)

Application of Multi-Layered Gradient Boosting Decision Trees in Pharmaceutical Classification

  • DU Shishuai, QIU Tian, LI Lingqiao, HU Jinquan, ZHENG Anbing, FENG Yanchun, HU Changqin, YANG Huihua

DOI
https://doi.org/10.3778/j.issn.1673-9418.1901069
Journal volume & issue
Vol. 14, no. 2
pp. 260 – 273

Abstract

Read online

Near-infrared spectroscopy technology is highly effective in pharmaceutical analysis. For high-dimensional and non-linear small-scale near-infrared data, traditional drug identification algorithms lack enough feature learning ability, neural network-based methods have problems of local optima and over-fitting, and they tend to ignore the sample imbalance. Aiming at the above disadvantages, a pharmaceutical classification approach with multi-layered gradient Boosting decision trees based on feature selection and cost-sensitive learning (CS_FGBDT) is proposed. Firstly, the raw data are preprocessed by Savitsky-Golay smoothing and first derivative. Secondly, the random forest is used to adaptively extract features from the preprocessed spectra, and the feature map is constructed by multi-layered gradient Boosting trees. Then the negative effect of sample imbalance is minimized by combining cost-sensitive learning. The experimental results show that the model comparatively evaluated on two imbalanced data-sets of capsule and tablet has higher prediction accuracy and stability and is an effective method for drug identification.

Keywords