International Journal of Applied Earth Observations and Geoinformation (Sep 2022)
RoadFormer: Pyramidal deformable vision transformers for road network extraction with remote sensing images
Abstract
The data-complete and detail-correct road network information serves as important evidence in numerous transportation-associated applications. Regular and rapid road network inventory updating is significantly necessary and meaningful to provide better services. Remote sensing images, due to their advantageous overlooking earth observation properties, have been widely used to assist in the road network interpretation tasks. However, it is still an open issue to accurately separate the road contents from the surrounding land covers in the remote sensing image with good connectivity and integrality because of the remarkably challengeable conditions of roads. In this regard, we develop a pyramidal deformable vision transformer architecture, termed as RoadFormer, to extract road networks with remote sensing images. Specifically, designed by a multi-context patch embedding scheme, a higher-quality token embedding can be obtained by adopting a multi-range, multi-view context observation strategy. Furthermore, formulated with a deformable transformer architecture, the semantic-relevant features can be focused on in a sparse global manner, which effectively promotes the feature representation quality and robustness. The proposed RoadFormer is elaborately evaluated on three large-scale road network extraction datasets. Quantitative assessments show that the RoadFormer achieves an overall performance of 0.8886 and 0.9407 with respect to the intersection over union (IoU) and F1-score metrics. In addition, contrastive evaluations also convince the promising potentiality and outstanding superiority of the RoadFormer for interpreting the road sections of varying circumstances under diverse challenging image scenarios.