IEEE Access (Jan 2024)
Exploiting Hanja-Based Resources in Processing Korean Historic Documents Written by Common Literati
Abstract
This research aims to explore the comprehension of historical Korean archives authored by common literati. Numerous endeavors have been made to study Korean historical documents; however, the majority of these endeavors focus solely on royal documents. By comparing the distinct linguistic characteristics between royal and commoner languages, this study challenges the applicability of the royal language-centric approach to commoner documents. In particular, we investigate the feasibility and limitations of existing resources that share the same writing system (Hanja) as historical Korean documents for processing Korean common literati documents. Through our investigation, we propose a simple yet effective methodology that enables the utilization of Hanja-based language resources in processing Korean common literati documents: the removal of special characters. We demonstrate that aligning characteristics of Hanja-based resources allows considerable performance improvements. To the best of our knowledge, our study represents the first research endeavor to concentrate on the comprehension of common literati documents.
Keywords