PLoS ONE (Jan 2022)
Multicow pose estimation based on keypoint extraction
Abstract
Automatic estimation of the poses of dairy cows over a long period can provide relevant information regarding their status and well-being in precision farming. Due to appearance similarity, cow pose estimation is challenging. To monitor the health of dairy cows in actual farm environments, a multicow pose estimation algorithm was proposed in this study. First, a monitoring system was established at a dairy cow breeding site, and 175 surveillance videos of 10 different cows were used as raw data to construct object detection and pose estimation data sets. To achieve the detection of multiple cows, the You Only Look Once (YOLO)v4 model based on CSPDarkNet53 was built and fine-tuned to output the bounding box for further pose estimation. On the test set of 400 images including single and multiple cows throughout the whole day, the average precision (AP) reached 94.58%. Second, the keypoint heatmaps and part affinity field (PAF) were extracted to match the keypoints of the same cow based on the real-time multiperson 2D pose detection model. To verify the performance of the algorithm, 200 single-object images and 200 dual-object images with occlusions were tested under different light conditions. The test results showed that the AP of leg keypoints was the highest, reaching 91.6%, regardless of day or night and single cows or double cows. This was followed by the AP values of the back, neck and head, sequentially. The AP of single cow pose estimation was 85% during the day and 78.1% at night, compared to double cows with occlusion, for which the values were 74.3% and 71.6%, respectively. The keypoint detection rate decreased when the occlusion was severe. However, in actual cow breeding sites, cows are seldom strongly occluded. Finally, a pose classification network was built to estimate the three typical poses (standing, walking and lying) of cows based on the extracted cow skeleton in the bounding box, achieving precision of 91.67%, 92.97% and 99.23%, respectively. The results showed that the algorithm proposed in this study exhibited a relatively high detection rate. Therefore, the proposed method can provide a theoretical reference for animal pose estimation in large-scale precision livestock farming.