A comparison of deep transfer learning backbone architecture techniques for printed text detection of different font styles from unstructured documents

Supriya Mahadevkar; Shruti Patil; Ketan Kotecha; Ajith Abraham

doi:10.7717/peerj-cs.1769

PeerJ Computer Science (Feb 2024)

A comparison of deep transfer learning backbone architecture techniques for printed text detection of different font styles from unstructured documents

Supriya Mahadevkar,
Shruti Patil,
Ketan Kotecha,
Ajith Abraham

Affiliations

Supriya Mahadevkar: Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India
Shruti Patil: Symbiosis Centre for Applied Artificial Intelligence, Symbiosis Institute of Technology Symbiosis International (Deemed University), Pune, Maharashtra, India
Ketan Kotecha: Symbiosis Centre for Applied Artificial Intelligence, Symbiosis Institute of Technology Symbiosis International (Deemed University), Pune, Maharashtra, India
Ajith Abraham: School of Computer Science Engineering & Technology, Bennett University, Greater Noida, Uttar Pradesh, India

DOI: https://doi.org/10.7717/peerj-cs.1769
Journal volume & issue: Vol. 10
p. e1769

Abstract

Read online Read online

Object detection methods based on deep learning have been used in a variety of sectors including banking, healthcare, e-governance, and academia. In recent years, there has been a lot of attention paid to research endeavors made towards text detection and recognition from different scenesor images of unstructured document processing. The article’s novelty lies in the detailed discussion and implementation of the various transfer learning-based different backbone architectures for printed text recognition. In this research article, the authors compared the ResNet50, ResNet50V2, ResNet152V2, Inception, Xception, and VGG19 backbone architectures with preprocessing techniques as data resizing, normalization, and noise removal on a standard OCR Kaggle dataset. Further, the top three backbone architectures selected based on the accuracy achieved and then hyper parameter tunning has been performed to achieve more accurate results. Xception performed well compared with the ResNet, Inception, VGG19, MobileNet architectures by achieving high evaluation scores with accuracy (98.90%) and min loss (0.19). As per existing research in this domain, until now, transfer learning-based backbone architectures that have been used on printed or handwritten data recognition are not well represented in literature. We split the total dataset into 80 percent for training and 20 percent for testing purpose and then into different backbone architecture models with the same number of epochs, and found that the Xception architecture achieved higher accuracy than the others. In addition, the ResNet50V2 model gave us higher accuracy (96.92%) than the ResNet152V2 model (96.34%).

Published in PeerJ Computer Science

ISSN: 2376-5992 (Online)
Publisher: PeerJ Inc.
Country of publisher: United States
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://peerj.com/computer-science/

About the journal

Abstract

Keywords