Domain Adaptation Deep Attention Network for Automatic Logo Detection and Recognition in Google Street View

Ervin Yohannes; Chih-Yang Lin; Timothy K. Shih; Chen-Ya Hong; Avirmed Enkhbat; Fitri Utaminingrum

doi:10.1109/ACCESS.2021.3098713

IEEE Access (Jan 2021)

Domain Adaptation Deep Attention Network for Automatic Logo Detection and Recognition in Google Street View

Ervin Yohannes,
Chih-Yang Lin,
Timothy K. Shih,
Chen-Ya Hong,
Avirmed Enkhbat,
Fitri Utaminingrum

Affiliations

Ervin Yohannes: ORCiD; Department of Computer Science and Information Engineering, National Central University, Taoyuan City, Taiwan
Chih-Yang Lin: ORCiD; Department of Electrical Engineering, Yuan-Ze University, Taoyuan City, Taiwan
Timothy K. Shih: ORCiD; Department of Computer Science and Information Engineering, National Central University, Taoyuan City, Taiwan
Chen-Ya Hong: Department of Computer Science and Information Engineering, National Central University, Taoyuan City, Taiwan
Avirmed Enkhbat: Department of Computer Science and Information Engineering, National Central University, Taoyuan City, Taiwan
Fitri Utaminingrum: ORCiD; Faculty of Computer Science, University of Brawijaya, Malang, Indonesia

DOI: https://doi.org/10.1109/ACCESS.2021.3098713
Journal volume & issue: Vol. 9
pp. 102623 – 102635

Abstract

Read online

Signboards are important location landmarks that provide services to a local community. Non-disabled people can easily understand the meaning of a signboard based on its special shape; however, visually impaired people who need an assistive system to guide them to destinations or to help them understand their surroundings cannot. Currently, designing accurate assistive systems remain a challenge. Computer vision struggles to recognize signboards due to the diverse designs that combine text and images. Moreover, there is a lack of datasets to train the best model and reach good results. In this paper, we propose a novel framework that can automatically detect and recognize signboard logos. In addition, we utilize Google Street View to collect signboard images from Taiwan’s streets. The proposed framework consists of a domain adaptation that not only reduces the loss function between source-target datasets, but also represents important source features adopted by the target dataset. In our model, we add nonlocal blocks and attention mechanisms called deep attention networks to achieve the best final result. We perform extensive experiments on both our dataset and public datasets to demonstrate the superior performance and effectiveness of our proposed method. The experimental results show that our proposed method outperforms state-of-the-art methods across all evaluation metrics.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords