Animal Detection and Classification from Camera Trap Images Using Different Mainstream Object Detection Architectures
Mengyu Tan,
Wentao Chao,
Jo-Ku Cheng,
Mo Zhou,
Yiwen Ma,
Xinyi Jiang,
Jianping Ge,
Lian Yu,
Limin Feng
Affiliations
Mengyu Tan
Ministry of Education Key Laboratory for Biodiversity Science and Engineering, National Forestry and Grassland Administration Key Laboratory for Conservation Ecology of Northeast Tiger and Leopard National Park, Northeast Tiger and Leopard Biodiversity National Observation and Research Station, National Forestry and Grassland Administration Amur Tiger and Amur Leopard Monitoring and Research Center, College of Life Sciences, Beijing Normal University, Beijing 100875, China
Wentao Chao
School of Artificial Intelligence, Beijing Normal University, Beijing 100875, China
Jo-Ku Cheng
School of Mathematical Sciences, Beijing Normal University, Beijing 100875, China
Mo Zhou
School of Mathematical Sciences, Beijing Normal University, Beijing 100875, China
Yiwen Ma
Ministry of Education Key Laboratory for Biodiversity Science and Engineering, National Forestry and Grassland Administration Key Laboratory for Conservation Ecology of Northeast Tiger and Leopard National Park, Northeast Tiger and Leopard Biodiversity National Observation and Research Station, National Forestry and Grassland Administration Amur Tiger and Amur Leopard Monitoring and Research Center, College of Life Sciences, Beijing Normal University, Beijing 100875, China
Xinyi Jiang
School of Mathematical Sciences, Beijing Normal University, Beijing 100875, China
Jianping Ge
Ministry of Education Key Laboratory for Biodiversity Science and Engineering, National Forestry and Grassland Administration Key Laboratory for Conservation Ecology of Northeast Tiger and Leopard National Park, Northeast Tiger and Leopard Biodiversity National Observation and Research Station, National Forestry and Grassland Administration Amur Tiger and Amur Leopard Monitoring and Research Center, College of Life Sciences, Beijing Normal University, Beijing 100875, China
Lian Yu
School of Mathematical Sciences, Beijing Normal University, Beijing 100875, China
Limin Feng
Ministry of Education Key Laboratory for Biodiversity Science and Engineering, National Forestry and Grassland Administration Key Laboratory for Conservation Ecology of Northeast Tiger and Leopard National Park, Northeast Tiger and Leopard Biodiversity National Observation and Research Station, National Forestry and Grassland Administration Amur Tiger and Amur Leopard Monitoring and Research Center, College of Life Sciences, Beijing Normal University, Beijing 100875, China
Camera traps are widely used in wildlife surveys and biodiversity monitoring. Depending on its triggering mechanism, a large number of images or videos are sometimes accumulated. Some literature has proposed the application of deep learning techniques to automatically identify wildlife in camera trap imagery, which can significantly reduce manual work and speed up analysis processes. However, there are few studies validating and comparing the applicability of different models for object detection in real field monitoring scenarios. In this study, we firstly constructed a wildlife image dataset of the Northeast Tiger and Leopard National Park (NTLNP dataset). Furthermore, we evaluated the recognition performance of three currently mainstream object detection architectures and compared the performance of training models on day and night data separately versus together. In this experiment, we selected YOLOv5 series models (anchor-based one-stage), Cascade R-CNN under feature extractor HRNet32 (anchor-based two-stage), and FCOS under feature extractors ResNet50 and ResNet101 (anchor-free one-stage). The experimental results showed that performance of the object detection models of the day-night joint training is satisfying. Specifically, the average result of our models was 0.98 mAP (mean average precision) in the animal image detection and 88% accuracy in the animal video classification. One-stage YOLOv5m achieved the best recognition accuracy. With the help of AI technology, ecologists can extract information from masses of imagery potentially quickly and efficiently, saving much time.