Smart Agricultural Technology (Aug 2024)
Effects of dataset curation on body condition score (BCS) determination with a vision transformer (ViT) applied to RGB+depth images
Abstract
Body condition score (BCS) has been a useful tool in estimating the health of cattle for many years now. This categorical metric requires experienced observers to visually inspect cows and assess body fat deposits regularly via a time consuming, subjective process. Low cost RGB+depth cameras have been used alongside machine learning algorithms in the past and have shown great promise, however, more advanced techniques are projected to yield better performance. In this work, a vision transformer (ViT) is pretrained using a recently developed self-supervised pretraining method, masked image modeling, and then fine-tuned on RGB+depth BCS data with the objective of improving performance. Model accuracy was found to be highly dependent on dataset curation, ranging from 64% to 92% accuracy. These discrepancies are attributed to non-unique data in the training and test splits and an inherently unbalanced dataset, both of which are discussed in detail. It is recommended that engineers and animal scientists collaborate more closely, as certain details related to dataset curation are critical to thoroughly assess performance and robustness of automated methods for BCS determination.