CECS-CLIP: Fusing Domain Knowledge for Rare Wildlife Detection Model

Feng Yang; Chunying Hu; Aokang Liang; Sheng Wang; Yun Su; Fu Xu

doi:10.3390/ani14192909

Animals (Oct 2024)

CECS-CLIP: Fusing Domain Knowledge for Rare Wildlife Detection Model

Feng Yang,
Chunying Hu,
Aokang Liang,
Sheng Wang,
Yun Su,
Fu Xu

Affiliations

Feng Yang: School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China
Chunying Hu: School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China
Aokang Liang: School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China
Sheng Wang: School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China
Yun Su: School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China
Fu Xu: School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China

DOI: https://doi.org/10.3390/ani14192909
Journal volume & issue: Vol. 14, no. 19
p. 2909

Abstract

Read online

Accurate and efficient wildlife monitoring is essential for conservation efforts. Traditional image-based methods often struggle to detect small, occluded, or camouflaged animals due to the challenges posed by complex natural environments. To overcome these limitations, an innovative multimodal target detection framework is proposed in this study, which integrates textual information from an animal knowledge base as supplementary features to enhance detection performance. First, a concept enhancement module was developed, employing a cross-attention mechanism to fuse features based on the correlation between textual and image features, thereby obtaining enhanced image features. Secondly, a feature normalization module was developed, amplifying cosine similarity and introducing learnable parameters to continuously weight and transform image features, further enhancing their expressive power in the feature space. Rigorous experimental validation on a specialized dataset provided by the research team at Northwest A&F University demonstrates that our multimodal model achieved a 0.3% improvement in precision over single-modal methods. Compared to existing multimodal target detection algorithms, this model achieved at least a 25% improvement in AP and excelled in detecting small targets of certain species, significantly surpassing existing multimodal target detection model benchmarks. This study offers a multimodal target detection model integrating textual and image information for the conservation of rare and endangered wildlife, providing strong evidence and new perspectives for research in this field.

Published in Animals

ISSN: 2076-2615 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Agriculture: Animal culture: Veterinary medicine; Science: Zoology
Website: http://www.mdpi.com/journal/animals/

About the journal

Abstract

Keywords