International Journal of Digital Earth (Dec 2023)
Scene-level buildings damage recognition based on Cross Conv-Transformer
Abstract
Different to pixel-based and object-based image recognition, a larger perspective based on the scene can improve the efficiency of assessing large-scale building damage. However, the complexity of disaster scenes and the scarcity of datasets are major challenges in identifying building damage. To address these challenges, the Cross Conv-Transformer model is proposed to classify and evaluate the degree of damage to buildings using aerial images taken after earthquake. We employ Conv-Embedding and Conv-Projection to extract features from the images. The integration of convolution and Transformer reduces the computational burden of the model while enhancing its feature extraction capabilities. Furthermore, the two branch Conv-Transformer architecture with global and local attention is designed, allowing each branch to focus on global and local features respectively. The cross-attention fusion module merges feature information from the two branches to enrich classification features. At last, we utilize aerial images captured during the Beichuan and Yushu earthquakes as both the training and test sets to assess the model. The proposed Cross Conv-Transformer model improved classification accuracy by 4.7% and 2.1% compared to the ViT and EfficientNet. The results show that the Cross Conv-Transformer model could significantly reduces misclassification between severely and moderately damaged categories.
Keywords