Genomic prediction with machine learning in sugarcane, a complex highly polyploid clonally propagated crop with substantial non‐additive variation for key traits

Chensong Chen; Owen Powell; Eric Dinglasan; Elizabeth M. Ross; Seema Yadav; Xianming Wei; Felicity Atkin; Emily Deomano; Ben J. Hayes

doi:10.1002/tpg2.20390

The Plant Genome (Dec 2023)

Genomic prediction with machine learning in sugarcane, a complex highly polyploid clonally propagated crop with substantial non‐additive variation for key traits

Chensong Chen,
Owen Powell,
Eric Dinglasan,
Elizabeth M. Ross,
Seema Yadav,
Xianming Wei,
Felicity Atkin,
Emily Deomano,
Ben J. Hayes

Affiliations

Chensong Chen: Queensland Alliance for Agriculture and Food Innovation University of Queensland Queensland Australia
Owen Powell: Queensland Alliance for Agriculture and Food Innovation University of Queensland Queensland Australia
Eric Dinglasan: Queensland Alliance for Agriculture and Food Innovation University of Queensland Queensland Australia
Elizabeth M. Ross: Queensland Alliance for Agriculture and Food Innovation University of Queensland Queensland Australia
Seema Yadav: Queensland Alliance for Agriculture and Food Innovation University of Queensland Queensland Australia
Xianming Wei: Sugar Research Australia Mackay Australia
Felicity Atkin: Sugar Research Australia Gordonvale Australia
Emily Deomano: Sugar Research Australia Indooroopilly Australia
Ben J. Hayes: Queensland Alliance for Agriculture and Food Innovation University of Queensland Queensland Australia

DOI: https://doi.org/10.1002/tpg2.20390
Journal volume & issue: Vol. 16, no. 4
pp. n/a – n/a

Abstract

Read online

Abstract Sugarcane has a complex, highly polyploid genome with multi‐species ancestry. Additive models for genomic prediction of clonal performance might not capture interactions between genes and alleles from different ploidies and ancestral species. As such, genomic prediction in sugarcane presents an interesting case for machine learning (ML) methods, which are purportedly able to deal with high levels of complexity in prediction. Here, we investigated deep learning (DL) neural networks, including multilayer networks (MLP) and convolution neural networks (CNN), and an ensemble machine learning approach, random forest (RF), for genomic prediction in sugarcane. The data set used was 2912 sugarcane clones, scored for 26,086 genome wide single nucleotide polymorphism markers, with final assessment trial data for total cane harvested (TCH), commercial cane sugar (CCS), and fiber content (Fiber). The clones in the latest trial (2017) were used as a validation set. We compared prediction accuracy of these methods to genomic best linear unbiased prediction (GBLUP) extended to include dominance and epistatic effects. The prediction accuracies from GBLUP models were up to 0.37 for TCH, 0.43 for CCS, and 0.48 for Fiber, while the optimized ML models had prediction accuracies of 0.35 for TCH, 0.38 for CCS, and 0.48 for Fiber. Both RF and DL neural network models have comparable predictive ability with the additive GBLUP model but are less accurate than the extended GBLUP model.

Published in The Plant Genome

ISSN: 1940-3372 (Online)
Publisher: Wiley
Country of publisher: United States
LCC subjects: Agriculture: Plant culture; Science: Biology (General): Genetics
Website: https://acsess.onlinelibrary.wiley.com/journal/19403372

About the journal