IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2024)

A Transformer-Based Multimodal Model for Urban–Rural Fringe Identification

  • Furong Jia,
  • Quanhua Dong,
  • Zhou Huang,
  • Xiao-Jian Chen,
  • Yi Wang,
  • Xia Peng,
  • Yuan Guo,
  • Ruixian Ma,
  • Fan Zhang,
  • Yu Liu

DOI
https://doi.org/10.1109/JSTARS.2024.3439429
Journal volume & issue
Vol. 17
pp. 15041 – 15051

Abstract

Read online

As the frontier of urbanization, urban–rural fringes (URFs) transitionally connect urban construction regions to the rural hinterland, and its identification is significant for the study of urbanization-related socioeconomic changes and human dynamics. Previous research on URF identification has predominantly relied on remote sensing data, which often provides a uniform overhead perspective with limited spatial resolution. As an additional data source, street view images (SVIs) offer a valuable human-related perspective, efficiently capturing intricate transitions from urban to rural areas. However, the abundant visual information offered by SVIs has often been overlooked and multimodal techniques have seldom been explored to integrate multisource data for delineating URFs. To address this gap, this study proposes a transformed-based multimodal methodology for identifying URFs, which includes a street view panorama classifier and a remote sensing classification model. In the study area of Beijing, the experimental results indicate that an URF with a total area of 731.24 $\text{km}^{2}$ surrounds urban cores, primarily located between the fourth and sixth ring roads. The effectiveness of the proposed method is demonstrated through comparative experiments with traditional URF identification methods. In addition, a series of ablation studies demonstrate the efficacy of incorporating multisource data. Based on the delineated URFs in Beijing, this research introduced points of interest data and commuting data to analyze the socioeconomic characteristics of URFs. The findings indicate that URFs are characterized by longer commuting distances and less diverse restaurant consumption patterns compared to more urbanized regions. This study enables the accurate identification of URFs through the transform-based multimodal approach integrating SVIs. Furthermore, it provides a human-centric comprehension of URFs, which is essential for informing strategies of urban planning and development.

Keywords