Stacked kinship CNN vs. GBLUP for genomic predictions of additive and complex continuous phenotypes

Nelson Nazzicari; Filippo Biscarini

doi:10.1038/s41598-022-24405-0

Scientific Reports (Nov 2022)

Stacked kinship CNN vs. GBLUP for genomic predictions of additive and complex continuous phenotypes

Nelson Nazzicari,
Filippo Biscarini

Affiliations

Nelson Nazzicari: CREA Council for Agricultural Research and Analysis of Agricultural Economics, Research Centre for Animal Production and Aquaculture
Filippo Biscarini: CNR: National Research Council, Institute of Agricultural Biology and Biotechnology

DOI: https://doi.org/10.1038/s41598-022-24405-0
Journal volume & issue: Vol. 12, no. 1
pp. 1 – 15

Abstract

Read online

Abstract Deep learning is impacting many fields of data science with often spectacular results. However, its application to whole-genome predictions in plant and animal science or in human biology has been rather limited, with mostly underwhelming results. While most works focus on exploring alternative network architectures, in this study we propose an innovative representation of marker genotype data and tested it against the GBLUP (Genomic BLUP) benchmark with linear and nonlinear phenotypes. From publicly available cattle SNP genotype data, different types of genomic kinship matrices are stacked together in a 3D pile from where 2D grayscale slices are extracted and fed to a deep convolutional neural network (DNN). We simulated nine phenotype scenarios with combinations of additivity, dominance and epistasis, and compared the DNN to GBLUP-A (computed using only the additive kinship matrix) and GBLUP-optim (additive, dominance, and epistasis kinship matrices, as needed). Results varied depending on the accuracy metric employed, with DNN performing better in terms of root mean squared error (1–12% lower than GBLUP-A; 1–9% lower than GBLUP-optim) but worse in terms of Pearson’s correlation (0.505 for DNN compared to 0.672 and 0.669 of GBLUP-A and GBLUP-optim for fully additive case; 0.274 for DNN, 0.279 for GBLUP-A, and 0.477 for GBLUP-optim for fully dominant case). The proposed approach offers a basis to explore further the application of DNN to tabular data in whole-genome predictions.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal