Complex & Intelligent Systems (Mar 2024)

The fuzzy support vector data description based on tightness for noisy label detection

  • Xiaoying Wu,
  • Sanyang Liu,
  • Yiguang Bai

DOI
https://doi.org/10.1007/s40747-024-01356-9
Journal volume & issue
Vol. 10, no. 3
pp. 4157 – 4174

Abstract

Read online

Abstract Machine learning (ML) is an approach driven by data, and as research in machine learning progresses, the issue of noisy labels has garnered widespread attention. Noisy labels can significantly reduce the accuracy of supervised classification models, making it important to address this problem. Therefore, it is a very meaningful task to detect as many noisy labels as possible from the big data. In this study, a new method is proposed for detecting noisy labels in datasets. This method leverages a deep pre-trained network to extract a feature set from the image data first which can extract more accurate data features. Then, a membership degree based on tightness into the support vector data description (SVDD) model named TF-SVDD is introduced to detect noisy data in the dataset. In order to simulate different types of label noise more accurately, we first assumed that the labels of the datasets used were all correct, and in addition constructed the noise set using two method: the density peak noise set and the random noise set. Experimental results demonstrate that the TF-SVDD can effectively detect noisy label data, surpassing traditional support vector data description algorithms and other methods in terms of outlier detection accuracy, with the average accuracy mostly exceeding 50 $$\%$$ % , and even reaching 80 $$\%$$ % . Furthermore, one novel measure called ‘confidence’ is employed to rectify noisy labels in the data. Following the correction of noisy labels, the accuracy of image classification experiences a significant improvement, with the average promotion ratio mostly exceeding 10 $$\%$$ % , and reaching 30 $$\%$$ % .

Keywords