Exploring Data Augmentation Algorithm to Improve Genomic Prediction of Top-Ranking Cultivars

Osval A. Montesinos-López; Arvinth Sivakumar; Gloria Isabel Huerta Prado; Josafhat Salinas-Ruiz; Afolabi Agbona; Axel Efraín Ortiz Reyes; Khalid Alnowibet; Rodomiro Ortiz; Abelardo Montesinos-López; José Crossa

doi:10.3390/a17060260

Algorithms (Jun 2024)

Exploring Data Augmentation Algorithm to Improve Genomic Prediction of Top-Ranking Cultivars

Osval A. Montesinos-López,
Arvinth Sivakumar,
Gloria Isabel Huerta Prado,
Josafhat Salinas-Ruiz,
Afolabi Agbona,
Axel Efraín Ortiz Reyes,
Khalid Alnowibet,
Rodomiro Ortiz,
Abelardo Montesinos-López,
José Crossa

Affiliations

Osval A. Montesinos-López: Facultad de Telemática, Universidad de Colima, Colima 28040, Mexico
Arvinth Sivakumar: ICAR—Indian Agricultural Research Institute, Pusa Campus, New Delhi 110012, India
Gloria Isabel Huerta Prado: Independent Researcher, Zinacatepec 75960, Mexico
Josafhat Salinas-Ruiz: Colegio de Postgraduados Campus Córdoba, Km. 348 Carretera Federal Córdoba-Veracruz, Amatlán de los Reyes, Veracruz 94946, Mexico
Afolabi Agbona: International Institute of Tropical Agriculture (IITA), Ibadan 200001, Nigeria
Axel Efraín Ortiz Reyes: Facultad de Telemática, Universidad de Colima, Colima 28040, Mexico
Khalid Alnowibet: Department of Statistics and Operations Research, King Saud University, Riyah 11459, Saudi Arabia
Rodomiro Ortiz: Department of Plant Breeding, Swedish University of Agricultural Science (SLU), P.O. Box SE 23436 Lomma, Sweden
Abelardo Montesinos-López: Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara 44430, Mexico
José Crossa: Dintiguish Scientist Fellowship Program, King Saud University, Riyah 11459, Saudi Arabia

DOI: https://doi.org/10.3390/a17060260
Journal volume & issue: Vol. 17, no. 6
p. 260

Abstract

Read online

Genomic selection (GS) is a groundbreaking statistical machine learning method for advancing plant and animal breeding. Nonetheless, its practical implementation remains challenging due to numerous factors affecting its predictive performance. This research explores the potential of data augmentation to enhance prediction accuracy across entire datasets and specifically within the top 20% of the testing set. Our findings indicate that, overall, the data augmentation method (method A), when compared to the conventional model (method C) and assessed using Mean Arctangent Absolute Prediction Error (MAAPE) and normalized root mean square error (NRMSE), did not improve the prediction accuracy for the unobserved cultivars. However, significant improvements in prediction accuracy (evidenced by reduced prediction error) were observed when data augmentation was applied exclusively to the top 20% of the testing set. Specifically, reductions in MAAPE_20 and NRMSE_20 by 52.86% and 41.05%, respectively, were noted across various datasets. Further investigation is needed to refine data augmentation techniques for effective use in genomic prediction.

Published in Algorithms

ISSN: 1999-4893 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.mdpi.com/journal/algorithms

About the journal

Abstract

Keywords