Automatic Optimization of Deep Learning Training through Feature-Aware-Based Dataset Splitting

Somayeh Shahrabadi; Telmo Adão; Emanuel Peres; Raul Morais; Luís G. Magalhães; Victor Alves

doi:10.3390/a17030106

Algorithms (Feb 2024)

Automatic Optimization of Deep Learning Training through Feature-Aware-Based Dataset Splitting

Somayeh Shahrabadi,
Telmo Adão,
Emanuel Peres,
Raul Morais,
Luís G. Magalhães,
Victor Alves

Affiliations

Somayeh Shahrabadi: Centro de Computação Gráfica—CCG/zgdv, University of Minho, Campus de Azurém, Edifício 14, 4800-058 Guimarães, Portugal
Telmo Adão: ALGORITMI Research Centre/LASI, University of Minho, 4710-057 Guimarães, Portugal
Emanuel Peres: Department of Engineering, School of Sciences and Technology, University of Trás-os-Montes e Alto Douro, 5000-801 Vila Real, Portugal
Raul Morais: Department of Engineering, School of Sciences and Technology, University of Trás-os-Montes e Alto Douro, 5000-801 Vila Real, Portugal
Luís G. Magalhães: ALGORITMI Research Centre/LASI, University of Minho, 4710-057 Guimarães, Portugal
Victor Alves: ALGORITMI Research Centre/LASI, University of Minho, 4710-057 Guimarães, Portugal

DOI: https://doi.org/10.3390/a17030106
Journal volume & issue: Vol. 17, no. 3
p. 106

Abstract

Read online

The proliferation of classification-capable artificial intelligence (AI) across a wide range of domains (e.g., agriculture, construction, etc.) has been allowed to optimize and complement several tasks, typically operationalized by humans. The computational training that allows providing such support is frequently hindered by various challenges related to datasets, including the scarcity of examples and imbalanced class distributions, which have detrimental effects on the production of accurate models. For a proper approach to these challenges, strategies smarter than the traditional brute force-based K-fold cross-validation or the naivety of hold-out are required, with the following main goals in mind: (1) carrying out one-shot, close-to-optimal data arrangements, accelerating conventional training optimization; and (2) aiming at maximizing the capacity of inference models to its fullest extent while relieving computational burden. To that end, in this paper, two image-based feature-aware dataset splitting approaches are proposed, hypothesizing a contribution towards attaining classification models that are closer to their full inference potential. Both rely on strategic image harvesting: while one of them hinges on weighted random selection out of a feature-based clusters set, the other involves a balanced picking process from a sorted list that stores data features’ distances to the centroid of a whole feature space. Comparative tests on datasets related to grapevine leaves phenotyping and bridge defects showcase promising results, highlighting a viable alternative to K-fold cross-validation and hold-out methods.

Published in Algorithms

ISSN: 1999-4893 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.mdpi.com/journal/algorithms

About the journal

Abstract

Keywords