IEEE Access (Jan 2020)
Predicting Drug Risk Level from Adverse Drug Reactions Using SMOTE and Machine Learning Approaches
Abstract
Adverse drug reactions (ADRs) are the major source of morbidity and mortality. The prediction of drug risk level based on ADRs is few. Our study aims at predicting the drug risk level from ADRs using machine learning approaches. A total of 985,960 ADR reports from 2011 to 2018 were attained from the Chinese spontaneous reporting database (CSRD) in Jiangsu Province. Among them, there were 887 Prescription (Rx) Drugs (84.72%), 113 Over-the-Counter-A (OTC-A) Drugs (10.79%) and 47 OTC-B Drugs (4.49%). An over-sampling method, Synthetic Minority Oversampling Technique (SMOTE), was applied to the imbalanced classification. Firstly, we proposed a multi-classification framework based on SMOTE and classifiers. Secondly, drugs in CSRD were taken as the samples, ADR signal values calculated by proportional reporting ratio (PRR) or information component (IC) were taken as the features. Then, we applied four classifiers: Random Forest (RF), Gradient Boost (GB), Logistic Regression (LR), AdaBoost (ADA) to the tagged data. After evaluating the classification results by specific metrics, we finally obtained the optimal combination of our framework, PRR-SMOTE-RF with an accuracy rate of 0.95. We anticipate that this study can be a strong auxiliary judgment basis for experts on the status change of Rx Drugs to OTC Drugs.
Keywords