Nongye tushu qingbao xuebao (Apr 2023)
Application of EfficientNet-based Transfer Learning in Image Classification of Modern Documents: Taking Shanghai Library's "Picture Gallery of Modern Chinese Literature" as an Example
Abstract
[Purpose/Significance] As important historical data, images in modern literature are increasingly valued by humanities researchers. The deep annotation of large-scale image resources has also become an important part of the construction of image data infrastructure. It is a new direction of image study to analyze the content of massive images by using technologies such as deep learning. The purpose of this paper is to address the challenges of automatic classification of large-scale modern document images, improve accuracy and efficiency in practical application through the empirical research of transfer learning based on a simplified EfficientNet network specifically optimized for modern document image classification. [Method/Process] This paper adopts the selective images of "Illustrated Century - Modern Chinese Literature Library" from Shanghai Library, which is a deep exploration of the image content in modern Chinese literature by the "National Newspaper Index". The research method is to improve the diversity of sample images by serial stacking of those selective 7,645 modern literature image data sets through imaging enhancement technologies such as cutting, white balance, tone separation, and affine transformation, based on the characteristics analysis of modern literature images. Then we conducted transfer learning by fine-tuning simplified EfficientNet depth convolution neural network model through the study of depth learning algorithms. Finally, an optimized model that performs well in modern literature image classification was identified. Our simplified model achieved an average classification Top1 accuracy of 90.97%, an average F1 value of 91.00%, which validated its simplification, efficient, and good generalization ability for modern literature image classification application. During the experiments, it was also found that some images had high similarity and phototropism in morphology, which led to not good-enough classification results. However, this does provide valuable insights for further optimization and simplification of EfficientNet network model. The performance comparison test results with ResNet50-vd network also fully demonstrate that the simplified EfficientNet network can more economically and efficiently support incremental iterative training of subsequent models for achieving high-precision artificial intelligence classification of modern literature image databases. [Results/Conclusions] The experimental results indicate that the model effectively improves the efficiency and accuracy of image classification, and thus it has certain application promotion value for solving the automatic classification challenges of large-scale images in modern literature. In the future, we will continue to explore its application in the extraction of digital image semantic information. Through digital image pre-processing and extraction of digital image content and characteristics, we will provide technical enablers for automatic extraction of semantic information, so as to truly reduce the workload of manual intervention and achieve semantic description of millions of image data.
Keywords