IEEE Access (Jan 2024)

Identification of Illegal Outdoor Advertisements Based on CLIP Fine-Tuning and OCR Technology

  • Haiyan Zhang,
  • Zheng Ding,
  • Md Sharid Kayes Dipu,
  • Pinrong Lv,
  • Yuxue Huang,
  • Hauwa Suleiman Abdullahi,
  • Ao Zhang,
  • Zhaoyu Song,
  • Yuanyuan Wang

DOI
https://doi.org/10.1109/ACCESS.2024.3424258
Journal volume & issue
Vol. 12
pp. 92976 – 92987

Abstract

Read online

Recognizing unauthorized outdoor advertising is important for a city’s visual appeal, organizational structure, and adherence to regulations. This paper aims to solve the problem that traditional models are difficult to accurately identify illegal outdoor advertising containing only text or its variants in the process of recognizing outdoor advertising in the form of graphic and text. The method described in the article for identifying illegal outdoor advertisements based on CLIP fine-tuning and OCR technology involves a comprehensive approach. The methodology involves fine-tuning the CLIP model using a combination of graphics and text. The first process uses the fine-tuned CLIP model to perform image recognition of outdoor advertisements in the form of images and texts. In the fine-tuned CLIP model, the zero-shot classification ability of the CLIP model is utilized, and it is integrated with the tip-adapter technology and the cache model. In addition, few-shot learning is incorporated into the fine-tuning process to solve the problem of data scarcity in illegal outdoor advertising. The model is trained using features extracted from images of illegal outdoor advertisements, enabling the model to understand the diversity of such advertisements, adapt to various languages, and develop reasoning and context understanding capabilities. The second process involves leveraging the PP-OCRv4 model to extract text information from outdoor advertisement images and accurately matching it with keywords in a pre-established banned word database. Through these two processes, the recognition of outdoor advertisements in image-text form is achieved. Experimental results show that the method achieves a testing accuracy of 93.5% on a self-built dataset of outdoor advertisement images. Furthermore, the PP-OCRv4 model improves text recognition accuracy by 3.83% compared to the traditional PP-OCRv3 model, and enhances image recognition accuracy by 15.46% over the traditional ResNet50 model. Therefore, the proposed method of fine-tuning CLIP and OCR combined with illegal outdoor advertising recognition improves the recognition accuracy of illegal outdoor advertising combined with images and texts to a certain extent.

Keywords