Crop Journal (Jun 2024)
Leveraging the potential of big genomic and phenotypic data for genome-wide association mapping in wheat
Abstract
Genome-wide association mapping studies (GWAS) based on Big Data are a potential approach to improve marker-assisted selection in plant breeding. The number of available phenotypic and genomic data sets in which medium-sized populations of several hundred individuals have been studied is rapidly increasing. Combining these data and using them in GWAS could increase both the power of QTL discovery and the accuracy of estimation of underlying genetic effects, but is hindered by data heterogeneity and lack of interoperability. In this study, we used genomic and phenotypic data sets, focusing on Central European winter wheat populations evaluated for heading date. We explored strategies for integrating these data and subsequently the resulting potential for GWAS. Establishing interoperability between data sets was greatly aided by some overlapping genotypes and a linear relationship between the different phenotyping protocols, resulting in high quality integrated phenotypic data. In this context, genomic prediction proved to be a suitable tool to study relevance of interactions between genotypes and experimental series, which was low in our case. Contrary to expectations, fewer associations between markers and traits were found in the larger combined data than in the individual experimental series. However, the predictive power based on the marker-trait associations of the integrated data set was higher across data sets. Therefore, the results show that the integration of medium-sized to Big Data is an approach to increase the power to detect QTL in GWAS. The results encourage further efforts to standardize and share data in the plant breeding community.