Journal of the Text Encoding Initiative (Aug 2019)
Encoding Disappearing Characters: The Case of Twentieth-Century Japanese-Canadian Names
Abstract
The Landscapes of Injustice project seeks to encode mid-twentieth-century documents by and about the Japanese-Canadian community so they are accessible to modern audiences. The fundamental problem is that some of the kanji used at that time have been replaced since then by different kanji, and others have been removed from lists of formally acceptable characters. This report documents our efforts with two technologies designed to address this situation. The first is the Standardized Variation Sequence (SVS) feature of Unicode. Our work revealed that this set of variation sequences does not completely cover the old and new glyph pairs identified by the Japanese authorities, and that the pairs formally identified by the Japanese authorities do not completely cover all the new glyph forms in general use. We turned to TEI’s <charDecl>, <glyph>, and <mapping> elements as a second technology to augment the support provided by Unicode. Lastly, we dealt with the issue of finding suitably qualified people to do the markup. The result is markup which retains the original glyphs and relates them to the modern glyphs, so that in our output products we will be able to support search and display using either form of the glyph.
Keywords