IEEE Access (Jan 2023)

Deep Cleaner—A Few Shot Image Dataset Cleaner Using Supervised Contrastive Learning

  • M. B. Bijoy,
  • Bhanu Prakash Pebbeti,
  • A. Sai Manoj,
  • S. Abdul Fathaah,
  • Akash Raut,
  • P. N. Pournami,
  • P. B. Jayaraj

DOI
https://doi.org/10.1109/ACCESS.2023.3247500
Journal volume & issue
Vol. 11
pp. 18727 – 18738

Abstract

Read online

Images are increasingly used for AI-based diagnosis and analysis of many diseases like cervical cancer, mouth cancer, glucose analysis from retina etc. In many cases, data collection is done by specialised camera modules which capture images of affected areas. As with any other sources of data, this process is also error-prone and may contain unwanted objects and regions that may require cleaning by removing them. Outliers in these kinds of dataset may adversely affect the performance of machine learning models. Manually cleaning would be a tedious task, especially when the data is collated from different sources. Hence, cleaning the data before training the model is of utmost importance. In this paper, we propose a Few-Shot learning based model pre-trained in supervised contrastive learning settings to automate the process of data cleaning. Our model learns the dataset distribution and distinguishes the accurate data points from noisy data points. We also show that scaling up the model can greatly improve the Few-Shot performance. On the noisy MobileODT cervical data, which was collected from Kaggle, our model obtained 52% accuracy without cleaning data using an EfficientNet architecture for the classification task. Whereas the same architecture with ROI cropping achieved an accuracy of 76.56% after cleaning through the proposed Deep Cleaner approach that requires only 100 clean images. The proposed approach performs 2.74% better than a denoising auto-encoder, which is considered a powerful anomaly detection technique.

Keywords