Deep Cleaner&#x2014;A Few Shot Image Dataset Cleaner Using Supervised Contrastive Learning

M. B. Bijoy; Bhanu Prakash Pebbeti; A. Sai Manoj; S. Abdul Fathaah; Akash Raut; P. N. Pournami; P. B. Jayaraj

doi:10.1109/ACCESS.2023.3247500

IEEE Access (Jan 2023)

Deep Cleaner—A Few Shot Image Dataset Cleaner Using Supervised Contrastive Learning

M. B. Bijoy,
Bhanu Prakash Pebbeti,
A. Sai Manoj,
S. Abdul Fathaah,
Akash Raut,
P. N. Pournami,
P. B. Jayaraj

Affiliations

M. B. Bijoy: ORCiD; Department of Computer Science and Engineering, National Institute of Technology at Calicut, Kozhikode, Calicut, India
Bhanu Prakash Pebbeti: ORCiD; Department of Computer Science and Engineering, National Institute of Technology at Calicut, Kozhikode, Calicut, India
A. Sai Manoj: Department of Computer Science and Engineering, National Institute of Technology at Calicut, Kozhikode, Calicut, India
S. Abdul Fathaah: Department of Computer Science and Engineering, National Institute of Technology at Calicut, Kozhikode, Calicut, India
Akash Raut: Department of Computer Science and Engineering, National Institute of Technology at Calicut, Kozhikode, Calicut, India
P. N. Pournami: Department of Computer Science and Engineering, National Institute of Technology at Calicut, Kozhikode, Calicut, India
P. B. Jayaraj: ORCiD; Department of Computer Science and Engineering, National Institute of Technology at Calicut, Kozhikode, Calicut, India

DOI: https://doi.org/10.1109/ACCESS.2023.3247500
Journal volume & issue: Vol. 11
pp. 18727 – 18738

Abstract

Read online

Images are increasingly used for AI-based diagnosis and analysis of many diseases like cervical cancer, mouth cancer, glucose analysis from retina etc. In many cases, data collection is done by specialised camera modules which capture images of affected areas. As with any other sources of data, this process is also error-prone and may contain unwanted objects and regions that may require cleaning by removing them. Outliers in these kinds of dataset may adversely affect the performance of machine learning models. Manually cleaning would be a tedious task, especially when the data is collated from different sources. Hence, cleaning the data before training the model is of utmost importance. In this paper, we propose a Few-Shot learning based model pre-trained in supervised contrastive learning settings to automate the process of data cleaning. Our model learns the dataset distribution and distinguishes the accurate data points from noisy data points. We also show that scaling up the model can greatly improve the Few-Shot performance. On the noisy MobileODT cervical data, which was collected from Kaggle, our model obtained 52% accuracy without cleaning data using an EfficientNet architecture for the classification task. Whereas the same architecture with ROI cropping achieved an accuracy of 76.56% after cleaning through the proposed Deep Cleaner approach that requires only 100 clean images. The proposed approach performs 2.74% better than a denoising auto-encoder, which is considered a powerful anomaly detection technique.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords