The fuzzy support vector data description based on tightness for noisy label detection

Xiaoying Wu; Sanyang Liu; Yiguang Bai

doi:10.1007/s40747-024-01356-9

Complex & Intelligent Systems (Mar 2024)

The fuzzy support vector data description based on tightness for noisy label detection

Xiaoying Wu,
Sanyang Liu,
Yiguang Bai

Affiliations

Xiaoying Wu: School of Mathematics and Statistics, Xidian University
Sanyang Liu: School of Mathematics and Statistics, Xidian University
Yiguang Bai: School of Mathematics and Statistics, Xidian University

DOI: https://doi.org/10.1007/s40747-024-01356-9
Journal volume & issue: Vol. 10, no. 3
pp. 4157 – 4174

Abstract

Read online

Abstract Machine learning (ML) is an approach driven by data, and as research in machine learning progresses, the issue of noisy labels has garnered widespread attention. Noisy labels can significantly reduce the accuracy of supervised classification models, making it important to address this problem. Therefore, it is a very meaningful task to detect as many noisy labels as possible from the big data. In this study, a new method is proposed for detecting noisy labels in datasets. This method leverages a deep pre-trained network to extract a feature set from the image data first which can extract more accurate data features. Then, a membership degree based on tightness into the support vector data description (SVDD) model named TF-SVDD is introduced to detect noisy data in the dataset. In order to simulate different types of label noise more accurately, we first assumed that the labels of the datasets used were all correct, and in addition constructed the noise set using two method: the density peak noise set and the random noise set. Experimental results demonstrate that the TF-SVDD can effectively detect noisy label data, surpassing traditional support vector data description algorithms and other methods in terms of outlier detection accuracy, with the average accuracy mostly exceeding 50 $$\%$$ % , and even reaching 80 $$\%$$ % . Furthermore, one novel measure called ‘confidence’ is employed to rectify noisy labels in the data. Following the correction of noisy labels, the accuracy of image classification experiences a significant improvement, with the average promotion ratio mostly exceeding 10 $$\%$$ % , and reaching 30 $$\%$$ % .

Published in Complex & Intelligent Systems

ISSN: 2199-4536 (Print); 2198-6053 (Online)
Publisher: Springer
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science; Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: https://www.springer.com/journal/40747

About the journal

Abstract

Keywords