IEEE Access (Jan 2024)
Synthetic Data Generation for Text Spotting on Printed Circuit Board Component Images
Abstract
Machine vision systems with programmed text detection and text recognition features are useful in manufacturing process to automatically locate and read text markings on mounted printed circuit board (PCB) components. To better handle input images with varying image quality, text quality, and text variations, the robustness of deep learning approach for end-to-end text spotting on PCB component images is worth exploring. However, limitations of public PCB component datasets for such research and imbalance of data in actual collected PCB component datasets hinder the training of deep learning text spotting model, and consequently necessitate the generation of synthetic data. In this study, a synthetic PCB component dataset is generated using our synthetic data generator that adds synthetic text with random character sequences on manually edited PCB component images to elevate the realism of the synthetic images. The synthetic dataset covers 66 character classes while providing synthetic text with diverse text variations in font, style, size, and color. We train an existing text spotting model called Text Perceptron using both real and synthetic datasets to detect and recognize arbitrary-shaped text markings on PCB components. Our synthetic PCB component dataset has improved the text spotting performance of Text Perceptron. The trained model achieves promising text detection result and encouraging end-to-end text spotting F-score on real PCB component images. It also meets an acceptable average inference time per image. Still, the text spotting performance of the trained model needs improvement to realize deployment for PCB component inspection.
Keywords