Algorithms (Mar 2024)

Highly Imbalanced Classification of Gout Using Data Resampling and Ensemble Method

  • Xiaonan Si,
  • Lei Wang,
  • Wenchang Xu,
  • Biao Wang,
  • Wenbo Cheng

DOI
https://doi.org/10.3390/a17030122
Journal volume & issue
Vol. 17, no. 3
p. 122

Abstract

Read online

Gout is one of the most painful diseases in the world. Accurate classification of gout is crucial for diagnosis and treatment which can potentially save lives. However, the current methods for classifying gout periods have demonstrated poor performance and have received little attention. This is due to a significant data imbalance problem that affects the learning attention for the majority and minority classes. To overcome this problem, a resampling method called ENaNSMOTE-Tomek link is proposed. It uses extended natural neighbors to generate samples that fall within the minority class and then applies the Tomek link technique to eliminate instances that contribute to noise. The model combines the ensemble ’bagging’ technique with the proposed resampling technique to improve the quality of generated samples. The performance of individual classifiers and hybrid models on an imbalanced gout dataset taken from the electronic medical records of a hospital is evaluated. The results of the classification demonstrate that the proposed strategy is more accurate than some imbalanced gout diagnosis techniques, with an accuracy of 80.87% and an AUC of 87.10%. This indicates that the proposed algorithm can alleviate the problems caused by imbalanced gout data and help experts better diagnose their patients.

Keywords