Remote Sensing (Sep 2024)
Unifying Building Instance Extraction and Recognition in UAV Images
Abstract
Building instance extraction and recognition (BEAR) extracts and further recognizes building instances in unmanned aerial vehicle (UAV) images, holds with paramount importance in urban understanding applications. To address this challenge, we propose a unified network, BEAR-Former. Given the difficulty of building instance recognition due to the small area and multiple instances in UAV images, we developed a novel multi-view learning method, Cross-Mixer. This method constructs a cross-regional branch and an intra-regional branch to, respectively, extract the global context dependencies and local spatial structural details of buildings. In the cross-regional branch, we cleverly employed cross-attention and polar coordinate relative position encoding to learn more discriminative features. To solve the BEAR problem end to end, we designed a channel group and fusion module (CGFM) as a shared encoder. The CGFM includes a channel group encoder layer to independently extract features and a channel fusion module to dig out the complementary information for multiple tasks. Additionally, an RoI enhancement strategy was designed to improve model performance. Finally, we introduced a new metric, Recall@(K, iou), to evaluate the performance of the BEAR task. Experimental results demonstrate the effectiveness of our method.
Keywords