BioMedical Engineering OnLine (Aug 2020)

Differentiating novel coronavirus pneumonia from general pneumonia based on machine learning

  • Chenglong Liu,
  • Xiaoyang Wang,
  • Chenbin Liu,
  • Qingfeng Sun,
  • Wenxian Peng

DOI
https://doi.org/10.1186/s12938-020-00809-9
Journal volume & issue
Vol. 19, no. 1
pp. 1 – 14

Abstract

Read online

Abstract Background Chest CT screening as supplementary means is crucial in diagnosing novel coronavirus pneumonia (COVID-19) with high sensitivity and popularity. Machine learning was adept in discovering intricate structures from CT images and achieved expert-level performance in medical image analysis. Methods An integrated machine learning framework on chest CT images for differentiating COVID-19 from general pneumonia (GP) was developed and validated. Seventy-three confirmed COVID-19 cases were consecutively enrolled together with 27 confirmed general pneumonia patients from Ruian People’s Hospital, from January 2020 to March 2020. To accurately classify COVID-19, region of interest (ROI) delineation was implemented based on ground-glass opacities (GGOs) before feature extraction. Then, 34 statistical texture features of COVID-19 and GP ROI images were extracted, including 13 gray-level co-occurrence matrix (GLCM) features, 15 gray-level-gradient co-occurrence matrix (GLGCM) features and 6 histogram features. High-dimensional features impact the classification performance. Thus, ReliefF algorithm was leveraged to select features. The relevance of each feature was the average weights calculated by ReliefF in n times. Features with relevance larger than the empirically set threshold T were selected. After feature selection, the optimal feature set along with 4 other selected feature combinations for comparison were applied to the ensemble of bagged tree (EBT) and four other machine learning classifiers including support vector machine (SVM), logistic regression (LR), decision tree (DT), and K-nearest neighbor with Minkowski distance equal weight (KNN) using tenfold cross-validation. Results and conclusions The classification accuracy (ACC), sensitivity (SEN), specificity (SPE) of our proposed method yield 94.16%, 88.62% and 100.00%, respectively. The area under the receiver operating characteristic curve (AUC) was 0.99. The experimental results indicate that the EBT algorithm with statistical textural features based on GGOs for differentiating COVID-19 from general pneumonia achieved high transferability, efficiency, specificity, sensitivity, and impressive accuracy, which is beneficial for inexperienced doctors to more accurately diagnose COVID-19 and essential for controlling the spread of the disease.

Keywords