Mathematics (Oct 2024)
Mathematical Data Models and Context-Based Features for Enhancing Historical Degraded Manuscripts Using Neural Network Classification
Abstract
A common cause of deterioration in historic manuscripts is ink transparency or bleeding from the opposite page. Philologists and paleographers can significantly benefit from minimizing these interferences when attempting to decipher the original text. Additionally, computer-aided text analysis can also gain from such text enhancement. In previous work, we proposed the use of neural networks (NNs) in combination with a data model that characterizes the damage when both sides of a page have been digitized. This approach offers the distinct advantage of allowing the creation of an artificial training set that teaches the NN to differentiate between clean and damaged pixels. We tested this concept using a shallow NN, which proved effective in categorizing texts with varying levels of deterioration. In this study, we adapt the NN design to tackling remaining classification uncertainties caused by areas of text overlap, inhomogeneity, and peaks of degradation. Specifically, we introduce a new output class for pixels within overlapping text areas and incorporate additional features related to the pixel context information to promote the same classification for pixels adjacent to each other. Our experiments demonstrate that these enhancements significantly improve the classification accuracy. This improvement is evident in the quality of both binarization, which aids in text analysis, and virtual restoration, aimed at recovering the manuscript’s original appearance. Tests conducted on a public dataset, using standard quality indices, reveal that the proposed method outperforms both our previous proposals and other notable methods found in the literature.
Keywords