Unsupervised Few Shot Key Frame Extraction for Cow Teat Videos

Youshan Zhang; Matthias Wieland; Parminder S. Basran

doi:10.3390/data7050068

Data (May 2022)

Unsupervised Few Shot Key Frame Extraction for Cow Teat Videos

Youshan Zhang,
Matthias Wieland,
Parminder S. Basran

Affiliations

Youshan Zhang: Department of Clinical Sciences, College of Veterinary Medicine, Cornell University, Ithaca, NY 14853, USA
Matthias Wieland: Department of Population Medicine and Diagnostic Sciences, College of Veterinary Medicine, Cornell University, Ithaca, NY 14853, USA
Parminder S. Basran: Department of Clinical Sciences, College of Veterinary Medicine, Cornell University, Ithaca, NY 14853, USA

DOI: https://doi.org/10.3390/data7050068
Journal volume & issue: Vol. 7, no. 5
p. 68

Abstract

Read online

A novel method of monitoring the health of dairy cows in large-scale dairy farms is proposed via image-based analysis of cows on rotary-based milking platforms, where deep learning is used to classify the extent of teat-end hyperkeratosis. The videos can be analyzed to segment the teats for feature analysis, which can then be used to assess the risk of infections and other diseases. This analysis can be performed more efficiently by using the key frames of each cow as they pass through the image frame. Extracting key frames from these videos would greatly simplify this analysis, but there are several challenges. First, data collection in the farm setting is harsh, resulting in unpredictable temporal key frame positions; empty, obfuscated, or shifted images of the cow’s teats; frequently empty stalls due to challenges with herding cows into the parlor; and regular interruptions and reversals in the direction of the parlor. Second, supervised learning requires expensive and time-consuming human annotation of key frames, which is impractical in large commercial dairy farms housing thousands of cows. Unsupervised learning methods rely on large frame differences and often suffer low performance. In this paper, we propose a novel unsupervised few-shot learning model which extracts key frames from large (∼21,000 frames) video streams. Using a simple L1 distance metric that combines both image and deep features between each unlabeled frame and a few (32) labeled key frames, a key frame selection mechanism, and a quality check process, key frames can be extracted with sufficient accuracy (F score 63.6%) and timeliness (<10 min per 21,000 frames) for commercial dairy farm setting demands.

Published in Data

ISSN: 2306-5729 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Bibliography. Library science. Information resources
Website: http://www.mdpi.com/journal/data

About the journal

Abstract

Keywords