Ecological Indicators (Aug 2022)

Automated detection of gunshots in tropical forests using convolutional neural networks

  • Lydia K.D. Katsis,
  • Andrew P. Hill,
  • Evelyn Piña-Covarrubias,
  • Peter Prince,
  • Alex Rogers,
  • C. Patrick Doncaster,
  • Jake L. Snaddon

Journal volume & issue
Vol. 141
p. 109128

Abstract

Read online

Unsustainable hunting is one of the leading drivers of global biodiversity loss, yet very few direct measures exist due to the difficulty in monitoring this cryptic activity. Where guns are commonly used for hunting, such as in the tropical forests of the Americas and Africa, acoustic detection can potentially provide a solution to this monitoring challenge. The emergence of low cost autonomous recording units (ARUs) brings into reach the ability to monitor hunting pressure over wide spatial and temporal scales. However, ARUs produce immense amounts of data, and long term and large-scale monitoring is not possible without efficient automated sound classification techniques. We tested the effectiveness of a sequential two-stage detection pipeline for detecting gunshots from acoustic data collected in the tropical forests of Belize. The pipeline involved an on-board detection algorithm which was developed and tested in a prior study, followed by a spectrogram based convolutional neural network (CNN), which was developed in this manuscript. As gunshots are rare events, we focussed on developing a classification pipeline that maximises recall at the cost of increased false positives, with the aim of using the classifier to assist human annotation of files. We trained the CNN on annotated data collected across two study sites in Belize, comprising 597 gunshots and 28,195 background sounds. Predictions from the annotated validation dataset comprising 150 gunshots and 7044 background sounds collected from the same sites yielded a recall of 0.95 and precision of 0.85. The combined recall of the two-step pipeline was estimated at 0.80. We subsequently applied the CNN to an un-annotated dataset of over 160,000 files collected in a spatially distinct study site to test for generalisability and precision under a more realistic monitoring scenario. Our model was able to generalise to this dataset, and classified gunshots with 0.57 precision and estimated 80% recall, producing a substantially more manageable dataset for human verification. Using a classifier-guided listening approach such as ours can make wide scale monitoring of threats such as hunting a feasible option for conservation management.

Keywords