IEEE Access (Jan 2024)

The Reality of High Performing Deep Learning Models: A Case Study on Document Image Classification

  • Saifullah,
  • Stefan Agne,
  • Andreas Dengel,
  • Sheraz Ahmed

DOI
https://doi.org/10.1109/ACCESS.2024.3425910
Journal volume & issue
Vol. 12
pp. 103537 – 103564

Abstract

Read online

Deep neural networks have demonstrated exceptional performance breakthroughs in the field of document image classification; yet, there has been limited research in the field that delves into the explainability of these models. In this paper, we present a comprehensive study in which we analyze 9 different explainability methods across 10 different state-of-the-art document classification models and 2 popular benchmark datasets, RVL-CDIP and Tobacco3482, making three major contributions. First, through an exhaustive qualitative and quantitative analysis of various explainability approaches, we demonstrate that the majority of them perform poorly in generating useful explanations for document images. Only two techniques, Occlusion and DeepSHAP, provide relatively faithful explanations, with DeepSHAP additionally offering better interpretability while Occlusion falls short in this regard. Second, to pinpoint the features most crucial to the models’ predictions, we present an approach for generating counterfactual explanations, the analysis of which reveals that many document classification models can be highly susceptible to minor perturbations in the input. Additionally, it suggests that these models may easily fall victim to biases in the document data, ultimately relying on seemingly irrelevant features to make their decisions. Specifically, on the RVL-CDIP dataset, we show that 25-50% of the overall model predictions and up to 60% of predictions for some classes were strongly dependent on these irrelevant features. Lastly, our analysis reveals that the popular document benchmark datasets, RVL-CDIP and Tobacco3482, are inherently biased, with document identification (ID) numbers of specific styles consistently appearing in certain document regions. If unaddressed, this bias allows the models to predict document classes solely by looking at the ID numbers and prevents them from learning more complex document features. Overall, by unveiling the strengths and weaknesses of various explainability methods, document datasets and deep learning models, our work presents a major step towards creating more transparent and robust AI-powered document image classification systems.

Keywords