Fine-grained ship image classification and detection based on a vision transformer and multi-grain feature vector FPN model

Fengxiang Wang; Deying Yu; Liang Huang; Yalun Zhang; Yongbing Chen; Zhiguo Wang

doi:10.1080/10095020.2024.2331552

Geo-spatial Information Science (Apr 2024)

Fine-grained ship image classification and detection based on a vision transformer and multi-grain feature vector FPN model

Fengxiang Wang,
Deying Yu,
Liang Huang,
Yalun Zhang,
Yongbing Chen,
Zhiguo Wang

Affiliations

Fengxiang Wang: State Key Laboratory of High Performance Computing, National University of Defense Technology, Changsha, China
Deying Yu: School of Electrical Engineering, Naval University of Engineering, Wuhan, China
Liang Huang: College of Electronic Engineering, Naval University of Engineering, Wuhan, China
Yalun Zhang: Combat Command Department, People’s Liberation Army Naval Command College, Nanjing, China
Yongbing Chen: School of Electrical Engineering, Naval University of Engineering, Wuhan, China
Zhiguo Wang: Department of Operational Research and Planning, Naval University of Engineering, Wuhan, China

DOI: https://doi.org/10.1080/10095020.2024.2331552

Abstract

Read online

ABSTRACTIn naval and civilian domains, meticulous ship classification and detection are paramount. Nevertheless, predominant research has gravitated toward leveraging Convolutional Neural Network (CNN)-centered methodologies, often overlooking the diverse granularity inherent in ship samples. In our pursuit to holistically extract features from ship images across varying granularities, we present a transformative architecture: the Vision Transformer and Multi-Grain Feature Vector Feature Pyramid Network (ViT-MGFV-FPN). This model synergistically melds the merits of MGFV-FPN with an augmented Vision Transformer (ViT) for a comprehensive image feature extraction. To cater to the extraction of broader image features whilst sidestepping the innate quadratic complexity of traditional ViT, we unveil an enhanced version christened the Global Swin Transformer. Concurrently, the MGFV-FPN is orchestrated to harness the prowess of CNNs in distilling intricate ship attributes. Rigorous empirical evaluations underscore our model’s superiority in juxtaposition with extant CNN and transformer-based paradigms for nuanced ship categorization.

Published in Geo-spatial Information Science

ISSN: 1009-5020 (Print); 1993-5153 (Online)
Publisher: Taylor & Francis Group
Country of publisher: United Kingdom
LCC subjects: Geography. Anthropology. Recreation: Mathematical geography. Cartography; Science: Astronomy: Geodesy
Website: https://www.tandfonline.com/journals/tgsi

About the journal

Abstract

Keywords