Synthetic Data Generation for Text Spotting on Printed Circuit Board Component Images

Wei Jie Brigitte Liau; Shiek Chi Tay; Ahmad Sufril Azlan Mohamed; Mohd Nadhir Ab Wahab; Lay Chuan Lim; Beng Kang Khaw; Mohd Halim Mohd Noor

doi:10.1109/ACCESS.2024.3394524

IEEE Access (Jan 2024)

Synthetic Data Generation for Text Spotting on Printed Circuit Board Component Images

Wei Jie Brigitte Liau,
Shiek Chi Tay,
Ahmad Sufril Azlan Mohamed,
Mohd Nadhir Ab Wahab,
Lay Chuan Lim,
Beng Kang Khaw,
Mohd Halim Mohd Noor

Affiliations

Wei Jie Brigitte Liau: ORCiD; School of Computer Sciences, Universiti Sains Malaysia (USM), Gelugor, Pulau Pinang, Malaysia
Shiek Chi Tay: ORCiD; School of Computer Sciences, Universiti Sains Malaysia (USM), Gelugor, Pulau Pinang, Malaysia
Ahmad Sufril Azlan Mohamed: ORCiD; School of Computer Sciences, Universiti Sains Malaysia (USM), Gelugor, Pulau Pinang, Malaysia
Mohd Nadhir Ab Wahab: ORCiD; School of Computer Sciences, Universiti Sains Malaysia (USM), Gelugor, Pulau Pinang, Malaysia
Lay Chuan Lim: SSD Platform Technology Development and Integration, Western Digital, Simpang Ampat, Penang, Malaysia
Beng Kang Khaw: SSD Platform Technology Development and Integration, Western Digital, Simpang Ampat, Penang, Malaysia
Mohd Halim Mohd Noor: ORCiD; School of Computer Sciences, Universiti Sains Malaysia (USM), Gelugor, Pulau Pinang, Malaysia

DOI: https://doi.org/10.1109/ACCESS.2024.3394524
Journal volume & issue: Vol. 12
pp. 61235 – 61251

Abstract

Read online

Machine vision systems with programmed text detection and text recognition features are useful in manufacturing process to automatically locate and read text markings on mounted printed circuit board (PCB) components. To better handle input images with varying image quality, text quality, and text variations, the robustness of deep learning approach for end-to-end text spotting on PCB component images is worth exploring. However, limitations of public PCB component datasets for such research and imbalance of data in actual collected PCB component datasets hinder the training of deep learning text spotting model, and consequently necessitate the generation of synthetic data. In this study, a synthetic PCB component dataset is generated using our synthetic data generator that adds synthetic text with random character sequences on manually edited PCB component images to elevate the realism of the synthetic images. The synthetic dataset covers 66 character classes while providing synthetic text with diverse text variations in font, style, size, and color. We train an existing text spotting model called Text Perceptron using both real and synthetic datasets to detect and recognize arbitrary-shaped text markings on PCB components. Our synthetic PCB component dataset has improved the text spotting performance of Text Perceptron. The trained model achieves promising text detection result and encouraging end-to-end text spotting F-score on real PCB component images. It also meets an acceptable average inference time per image. Still, the text spotting performance of the trained model needs improvement to realize deployment for PCB component inspection.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords