Automatic Estimation for Visual Quality Changes of Street Space via Street-View Images and Multimodal Large Language Models

Hao Liang; Jiaxin Zhang; Yunqin Li; Bowen Wang; Jingyong Huang

doi:10.1109/ACCESS.2024.3408843

IEEE Access (Jan 2024)

Automatic Estimation for Visual Quality Changes of Street Space via Street-View Images and Multimodal Large Language Models

Hao Liang,
Jiaxin Zhang,
Yunqin Li,
Bowen Wang,
Jingyong Huang

Affiliations

Hao Liang: College of Landscape Architecture, Nanjing Forestry University, Nanjing, China
Jiaxin Zhang: ORCiD; Architecture and Design College, Nanchang University, Nanchang, China
Yunqin Li: Architecture and Design College, Nanchang University, Nanchang, China
Bowen Wang: ORCiD; Graduate School of Information Science and Technology, Osaka University, Osaka, Japan
Jingyong Huang: ORCiD; Architecture and Design College, Nanchang University, Nanchang, China

DOI: https://doi.org/10.1109/ACCESS.2024.3408843
Journal volume & issue: Vol. 12
pp. 87713 – 87727

Abstract

Read online

Estimating Visual Quality of Street Space (VQoSS) is pivotal for urban design, environmental sustainability, civic engagement, etc. Recent advancements, notably in deep learning, have enabled large-scale analysis. However, traditional deep learning approaches are hampered by extensive data annotation requirements and limited adaptability across diverse VQoSS tasks. Multimodal Large Language Models (MLLMs) have recently demonstrated proficiency in various computer vision tasks, positioning them as promising tools for automated VQoSS assessment. In this paper, we pioneer the application of MLLMs to VQoSS change estimation, with our empirical findings affirming their effectiveness. In addition, we introduce Street Quality Generative Pre-trained Transformer (SQ-GPT), a model that distills knowledge from the current most powerful but inaccessible (not free) GPT-4V, requiring no human efforts. SQ-GPT approaches GPT-4V’s performance and is viable for large-scale VQoSS change estimation. In a case study of Nanjing, we showcase the practicality of SQ-GPT and knowledge distillation pipeline. Our work promises to be a valuable asset for future urban studies research.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords