Automatika (Apr 2022)

Restoration of deteriorated text sections in ancient document images using a tri-level semi-adaptive thresholding technique

  • N. Shobha Rani,
  • B. J. Bipin Nair,
  • M. Chandrajith,
  • G. Hemantha Kumar,
  • Jaume Fortuny

DOI
https://doi.org/10.1080/00051144.2022.2042462
Journal volume & issue
Vol. 63, no. 2
pp. 378 – 398

Abstract

Read online

The proposed research aims to restore deteriorated text sections that are affected by stain markings, ink seepages and document ageing in ancient document photographs, as these challenges confront document enhancement. A tri-level semi-adaptive thresholding technique is developed in this paper to overcome the issues. The primary focus, however, is on removing deteriorations that obscure text sections. The proposed algorithm includes three levels of degradation removal as well as pre- and post-enhancement processes. In level-wise degradation removal, a global thresholding approach is used, whereas, pseudo-colouring uses local thresholding procedures. Experiments on palm leaf and DIBCO document photos reveal a decent performance in removing ink/oil stains whilst retaining obscured text sections. In DIBCO and palm leaf datasets, our system also showed its efficacy in removing common deteriorations such as uneven illumination, show throughs, discolouration and writing marks. The proposed technique directly correlates to other thresholding-based benchmark techniques producing average F-measure and precision of 65.73 and 93% towards DIBCO datasets and 55.24 and 94% towards palm leaf datasets. Subjective analysis shows the robustness of proposed model towards the removal of stains degradations with a qualitative score of 3 towards 45% of samples indicating degradation removal with fairly readable text. Highlights This work presents a semi-adaptive binarization technique for ancient image enhancement. Main focus of this work is to restore obscured text sections. Multi-level thresholding approach is used for the removal of degradations. Gradient of the original image is used in the computation of reference image to detect deteriorated text sections. Pseudo-colouring and post-enhancement process finally transform to the enhanced image. DIBCO and palm leaf document samples are used for experimentations.

Keywords