IJCCS (Indonesian Journal of Computing and Cybernetics Systems) (Jul 2023)
Deep Learning Approaches for Nusantara Scripts Optical Character Recognition
Abstract
The number of speakers of regional languages who are able to read and to write traditional scripts in Indonesia is decreasing. If left unaddressed, this will lead to the extinction of Nusantara scripts and it is not impossible that their reading methods will be forgotten in the future. To anticipate this, this study aims to preserve the knowledge of reading ancient scripts by developing a Deep Learning model that can read document images written using one of the 10 Nusantara scripts we have collected: Bali, Batak, Bugis, Javanese, Kawi, Kerinci, Lampung, Pallava, Rejang, and Sundanese. While previous studies have made efforts to read traditional Nusantara scripts using various Machine Learning and Convolutional Neural Network algorithms, they have primarily focused on specific scripts and lacked an integrated approach from script type recognition to character recognition. This study is the first to comprehensively address the entire range of Nusantara scripts, encompassing script type detection and character recognition. Convolutional Neural Network, ConvMixer, and Visual Transformer models were utilized and their respective performances were compared. The results demonstrate that our models achieved 96% accuracy in classifying Nusantara script types, with character recognition accuracy ranging from 93% to approximately 100% across the ten scripts.
Keywords