Methods in Ecology and Evolution (Jan 2023)

Can CNN‐based species classification generalise across variation in habitat within a camera trap survey?

  • Danielle L. Norman,
  • Philipp H. Bischoff,
  • Oliver R. Wearn,
  • Robert M. Ewers,
  • J. Marcus Rowcliffe,
  • Benjamin Evans,
  • Sarab Sethi,
  • Philip M. Chapman,
  • Robin Freeman

DOI
https://doi.org/10.1111/2041-210X.14031
Journal volume & issue
Vol. 14, no. 1
pp. 242 – 251

Abstract

Read online

Abstract Camera trap surveys are a popular ecological monitoring tool that produce vast numbers of images making their annotation extremely time‐consuming. Advances in machine learning, in the form of convolutional neural networks, have demonstrated potential for automated image classification, reducing processing time. These networks often have a poor ability to generalise, however, which could impact assessments of species in habitats undergoing change. Here, we (i) compare the performance of three network architectures in identifying species in camera trap images taken from tropical forest of varying disturbance intensities; (ii) explore the impacts of training dataset configuration; (iii) use habitat disturbance categories to investigate network generalisability and (iv) test whether classification performance and generalisability improve when using images cropped to bounding boxes. Overall accuracy (72.8%) was improved by excluding the rarest species and by adding extra training images (76.3% and 82.8%, respectively). Generalisability to new camera locations within a disturbance level was poor (mean F1‐score: 0.32). Performance across unseen habitat disturbance levels was worse (mean F1‐score: 0.27). Training the network on multiple disturbance levels improved generalisability (mean F1‐score on unseen disturbance levels: 0.41). Cropping images to bounding boxes improved overall performance (F1‐score: 0.77 vs. 0.47) and generalisability (mean F1‐score on unseen disturbance levels: 0.73), but at a cost of losing images that contained animals which the detector failed to detect. These results suggest researchers should consider using an object detector before passing images to a classifier, and an improvement in classification might be seen if labelled images from other studies are added to their training data. Composition of training data was shown to be influential, but including rarer classes did not compromise performance on common classes, providing support for the inclusion of rare species to inform conservation efforts. These findings have important implications for use of these methods for long‐term monitoring of habitats undergoing change, as they highlight the potential for misclassifications due to poor generalisability to impact subsequent ecological analyses. These methods therefore need to be considered as dynamic, in that changes to the study site would need to be reflected in the updated training of the network.

Keywords