A Data-Centric Approach to improve performance of deep learning models

Nikita Bhatt; Nirav Bhatt; Purvi Prajapati; Vishal Sorathiya; Samah Alshathri; Walid El-Shafai

doi:10.1038/s41598-024-73643-x

Scientific Reports (Sep 2024)

A Data-Centric Approach to improve performance of deep learning models

Nikita Bhatt,
Nirav Bhatt,
Purvi Prajapati,
Vishal Sorathiya,
Samah Alshathri,
Walid El-Shafai

Affiliations

Nikita Bhatt: Department of Computer Engineering, U & P U. Patel, CSPIT, CHARUSAT
Nirav Bhatt: Department of Artificial Intelligence and Machine Learning, CSPIT, CHARUSAT
Purvi Prajapati: Smt. K. D. Patel Department of Information Technology, CSPIT, CHARUSAT
Vishal Sorathiya: Faculty of Engineering and Technology, Parul Institute of Engineering and Technology, Parul University
Samah Alshathri: Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University
Walid El-Shafai: Security Engineering Lab, Computer Science Department, Prince Sultan University

DOI: https://doi.org/10.1038/s41598-024-73643-x
Journal volume & issue: Vol. 14, no. 1
pp. 1 – 11

Abstract

Read online

Abstract The Artificial Intelligence has evolved and is now associated with Deep Learning, driven by availability of vast amount of data and computing power. Traditionally, researchers have adopted a Model-Centric Approach, focusing on developing new algorithms and models to enhance performance without altering the underlying data. However, Andrew Ng, a prominent figure in the AI community, has recently emphasized on better (quality) data rather than better models, which has given birth to Data Centric Approach, also known as Data Oriented technique. The transition from model oriented to data oriented approach has rapidly gained momentum within the realm of deep learning. Despite its promise, the Data-Centric Approach faces several challenges, including (a) generating high-quality data, (b) ensuring data privacy, and (c) addressing biases to achieve fairness in datasets. Currently, there has been limited effort in preparing quality data. Our work aims to address this gap by focusing on the generation of high-quality data through methods such as data augmentation, multi-stage hashing to eliminate duplicate instances, to detect and correct noisy labels, using confident learning. The experiments on popular datasets, namely MNIST, Fashion MNIST, and CIFAR-10 were performed by utilizing ResNet-18 as the common framework followed by both Model Centric and Data Centric Approach. Comparative performance analysis revealed that the Data Centric Approach consistently outperformed the Model Centric Approach by a relative margin of at least 3%. This finding highlights the potential for further exploration and adoption of the Data-Centric Approach in various domains such as healthcare, finance, education, and entertainment, where the quality of data could significantly enhance the performance.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal

Abstract

Keywords