Sensors (Jun 2024)

A Pix2Pix Architecture for Complete Offline Handwritten Text Normalization

  • Alvaro Barreiro-Garrido,
  • Victoria Ruiz-Parrado,
  • A. Belen Moreno,
  • Jose F. Velez

DOI
https://doi.org/10.3390/s24123892
Journal volume & issue
Vol. 24, no. 12
p. 3892

Abstract

Read online

In the realm of offline handwritten text recognition, numerous normalization algorithms have been developed over the years to serve as preprocessing steps prior to applying automatic recognition models to handwritten text scanned images. These algorithms have demonstrated effectiveness in enhancing the overall performance of recognition architectures. However, many of these methods rely heavily on heuristic strategies that are not seamlessly integrated with the recognition architecture itself. This paper introduces the use of a Pix2Pix trainable model, a specific type of conditional generative adversarial network, as the method to normalize handwritten text images. Also, this algorithm can be seamlessly integrated as the initial stage of any deep learning architecture designed for handwritten recognition tasks. All of this facilitates training the normalization and recognition components as a unified whole, while still maintaining some interpretability of each module. Our proposed normalization approach learns from a blend of heuristic transformations applied to text images, aiming to mitigate the impact of intra-personal handwriting variability among different writers. As a result, it achieves slope and slant normalizations, alongside other conventional preprocessing objectives, such as normalizing the size of text ascenders and descenders. We will demonstrate that the proposed architecture replicates, and in certain cases surpasses, the results of a widely used heuristic algorithm across two metrics and when integrated as the first step of a deep recognition architecture.

Keywords