Journal of Dairy Science (Nov 2022)
Rapid turnover of sensor data to genetic evaluation for dairy cows in the cloud
Abstract
ABSTRACT: More and more sensor and automation data are available that enable animal breeders to define novel traits. However, sensor and automation data are often frequently measured differently (e.g., milk yield and different milk components are continuously measured during each milking). These differences are challenging animal breeders to define traits and use the most appropriate analytical models for genetic evaluation and breeding values. Traditionally, the process from raw data to breeding value estimations involves several steps: data curation, trait definition, variance component estimation, genetic evaluation, and validation of the estimated breeding values (EBV). All these steps often take many iterations and several research projects to optimize the final genetic evaluations. To make this entire process—from raw data to validated EBV—more efficient, we combined all these steps in a cloud environment that allows for faster processing and a faster data distribution time. We used real data (including 1,782,373,113 daily milk-yield records of 1,120,550 dairy cows) and a real trait (a resilience trait based on the deviations from expected milk yields) to demonstrate the functioning of this cloud environment. The daily milk-yield records were incorporated into our cloud solution, in which we have set up central binary large object storage. Subsequent steps were all performed in the cloud. The data set was preprocessed in approximately 6 h to obtain the resilience indicator for 352,871 cows in the first 3 lactations. Estimation of genetic parameters (heritabilities and genetic correlations) was performed by splitting the data into 5 subsets in ASReml, and prediction of subsequent EBV was performed on the entire data set using MiXBLUP. Together with the validation of breeding values, this process encompassed 16.5 h. By combining the different steps from preprocessing sensor data to genetic evaluation of new traits in one cloud environment, we generated EBV and validation plots in approximately 1 working day. Moreover, our setup is a flexible design and can be adapted easily to test new, longitudinal sensor-driven traits and compare the performance of these new traits to previous ones.