Data Augmentation Techniques for Machine Learning Applied to Optical Spectroscopy Datasets in Agrifood Applications: A Comprehensive Review

Ander Gracia Moisés; Ignacio Vitoria Pascual; José Javier Imas González; Carlos Ruiz Zamarreño

doi:10.3390/s23208562

Sensors (Oct 2023)

Data Augmentation Techniques for Machine Learning Applied to Optical Spectroscopy Datasets in Agrifood Applications: A Comprehensive Review

Ander Gracia Moisés,
Ignacio Vitoria Pascual,
José Javier Imas González,
Carlos Ruiz Zamarreño

Affiliations

Ander Gracia Moisés: Department of Electrical, Electronic and Communications Engineering, Public University of Navarra, Campus Arrosadía, 31006 Pamplona, NA, Spain
Ignacio Vitoria Pascual: Department of Electrical, Electronic and Communications Engineering, Public University of Navarra, Campus Arrosadía, 31006 Pamplona, NA, Spain
José Javier Imas González: Department of Electrical, Electronic and Communications Engineering, Public University of Navarra, Campus Arrosadía, 31006 Pamplona, NA, Spain
Carlos Ruiz Zamarreño: Department of Electrical, Electronic and Communications Engineering, Public University of Navarra, Campus Arrosadía, 31006 Pamplona, NA, Spain

DOI: https://doi.org/10.3390/s23208562
Journal volume & issue: Vol. 23, no. 20
p. 8562

Abstract

Read online

Machine learning (ML) and deep learning (DL) have achieved great success in different tasks. These include computer vision, image segmentation, natural language processing, predicting classification, evaluating time series, and predicting values based on a series of variables. As artificial intelligence progresses, new techniques are being applied to areas like optical spectroscopy and its uses in specific fields, such as the agrifood industry. The performance of ML and DL techniques generally improves with the amount of data available. However, it is not always possible to obtain all the necessary data for creating a robust dataset. In the particular case of agrifood applications, dataset collection is generally constrained to specific periods. Weather conditions can also reduce the possibility to cover the entire range of classifications with the consequent generation of imbalanced datasets. To address this issue, data augmentation (DA) techniques are employed to expand the dataset by adding slightly modified copies of existing data. This leads to a dataset that includes values from laboratory tests, as well as a collection of synthetic data based on the real data. This review work will present the application of DA techniques to optical spectroscopy datasets obtained from real agrifood industry applications. The reviewed methods will describe the use of simple DA techniques, such as duplicating samples with slight changes, as well as the utilization of more complex algorithms based on deep learning generative adversarial networks (GANs), and semi-supervised generative adversarial networks (SGANs).

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords