IEEE Access (Jan 2023)
Categorical Weighting Domination for Imbalanced Classification With Skin Cancer in Intelligent Healthcare Systems
Abstract
In the field of dermatological diseases, especially for skin cancer, machine learning (ML) methods are used to classify melanoma and nevus using skin images. ML techniques result in high accuracy of diagnostic tasks since they are trained on balanced datasets. However, MLs working with imbalanced datasets produce erroneous results on precision, sensitivity, and specificity measured criteria. To deal with this problem, an augmentation approach combined with a category seesaw is used for the compensation factor. It increases the penalty for misclassified instances, thereby reducing the occurrence of false positives within the less common categories. This paper presents an approach to improve the efficiency of DCNN for classifying multi-class medical images on imbalanced datasets. The solution consists of three major contributions: (1) feature extraction based on some backbone models with customizing fully connected layers for classifier layers, (2) optimizing loss function (LF) and training parameters, (3) solving the problem of imbalanced samples using optimizing domination of weights between asymmetric classes with majority and minority categories. The method was evaluated and analyzed using the ISIC2018 benchmark and Chest X-ray dataset. Some well-known backbones were used for this study, e.g., EfficientNets, MobileNets, and DenseNets. The use of these backbones is to demonstrate that our methods are more efficient and stable in both light and heavy DCNN architectures. We also provide comparisons with existing methods that deal with the imbalance problem, e.g., data augmentation (AU), downsamples, customizing LF, and focal loss method (FL) for focusing on hard samples. Experimental results showed that these methods achieve good performance. However, there are several problems caused by generating new samples, and weighting samples, such as data overloading to train classifier models, a corrupt problem when applied to imbalanced data. Moreover, the FL method produced insufficient results on various DCNN backbones. Differently, our approach solves the imbalanced dataset based on boosting the sample weights of the minority and reducing the impact ratio of samples in majority categories. This strategy results in high precision and stable performance with various DCNN models without augmenting the dataset. Experiment results on ISIC2018 dataset demonstrated that our approach achieves more efficiency than other methods in some specific evaluation criteria as follows: higher than the FL method with 2.73% recall, 2.63% precision, 2.81% specificity, and 3.09% F1 using EfficientNet backbones; higher than AU method with 5.16% recall, 5.97% precision, 8.93% specificity, 6.16% F1 using DenseNet backbones.
Keywords