Frontiers in Marine Science (Nov 2024)
Localization and tracking of beluga whales in aerial video using deep learning
Abstract
Aerial images are increasingly adopted and widely used in various research areas. In marine mammal studies, these imagery surveys serve multiple purposes: determining population size, mapping migration routes, and gaining behavioral insights. A single aerial scan using a drone yields a wealth of data, but processing it requires significant human effort. Our research demonstrates that deep learning models can significantly reduce human effort. They are not only able to detect marine mammals but also track their behavior using continuous aerial (video) footage. By distinguishing between different age classes, these algorithms can inform studies on population biology, ontogeny, and adult-calf relationships. To detect beluga whales from imagery footage, we trained the YOLOv7 model on a proprietary dataset of aerial footage of beluga whales. The deep learning model achieved impressive results with the following precision and recall scores: beluga adult = 92%—92%, beluga calf = 94%—89%. To track the detected beluga whales, we implemented the deep Simple Online and Realtime Tracking (SORT) algorithm. Unfortunately, the performance of the deep SORT algorithm was disappointing, with Multiple Object Tracking Accuracy (MOTA) scores ranging from 27% to 48%. An analysis revealed that the low tracking accuracy resulted from identity switching; that is, an identical beluga whale was given two IDs in two different frames. To overcome the problem of identity switching, a new post-processing algorithm was implemented, significantly improving MOTA to approximately 70%. The main contribution of this research is providing a system that accurately detects and tracks features of beluga whales, both adults and calves, from aerial footage. Additionally, this system can be customized to identify and analyze other marine mammal species by fine-tuning the model with annotated data.
Keywords