Frame‐by‐frame annotation of video recordings using deep neural networks

Alexander M. Conway; Ian N. Durbach; Alistair McInnes; Robert N. Harris

doi:10.1002/ecs2.3384

Ecosphere (Mar 2021)

Frame‐by‐frame annotation of video recordings using deep neural networks

Alexander M. Conway,
Ian N. Durbach,
Alistair McInnes,
Robert N. Harris

Affiliations

Alexander M. Conway: Centre for Statistics in Ecology, the Environment, and Conservation University of Cape Town Cape Town South Africa
Ian N. Durbach: Centre for Statistics in Ecology, the Environment, and Conservation University of Cape Town Cape Town South Africa
Alistair McInnes: Seabird Conservation Programme BirdLife South Africa Johannesburg South Africa
Robert N. Harris: Sea Mammal Research Unit University of St Andrews St Andrews UK

DOI: https://doi.org/10.1002/ecs2.3384
Journal volume & issue: Vol. 12, no. 3
pp. n/a – n/a

Abstract

Read online

Abstract Video data are widely collected in ecological studies, but manual annotation is a challenging and time‐consuming task, and has become a bottleneck for scientific research. Classification models based on convolutional neural networks (CNNs) have proved successful in annotating images, but few applications have extended these to video classification. We demonstrate an approach that combines a standard CNN summarizing each video frame with a recurrent neural network (RNN) that models the temporal component of video. The approach is illustrated using two datasets: one collected by static video cameras detecting seal activity inside coastal salmon nets and another collected by animal‐borne cameras deployed on African penguins, used to classify behavior. The combined RNN‐CNN led to a relative improvement in test set classification accuracy over an image‐only model of 25% for penguins (80% to 85%), and substantially improved classification precision or recall for four of six behavior classes (12–17%). Image‐only and video models classified seal activity with very similar accuracy (88 and 89%), and no seal visits were missed entirely by either model. Temporal patterns related to movement provide valuable information about animal behavior, and classifiers benefit from including these explicitly. We recommend the inclusion of temporal information whenever manual inspection suggests that movement is predictive of class membership.

Published in Ecosphere

ISSN: 2150-8925 (Online)
Publisher: Wiley
Country of publisher: United States
LCC subjects: Science: Biology (General): Ecology
Website: https://esajournals.onlinelibrary.wiley.com/journal/21508925

About the journal

Abstract

Keywords