Khazanah Informatika (Apr 2020)
Writer Identification of Lampung Handwritten Documents Based on Selected Characters
Abstract
Writer identification is a sub-field in handwriting recognition which its objective is to determine the identity of the writer based on handwriting input. The goal is usually for forensic purposes such as finding the perpetrators of crimes that leave traces of evidence in the form of written messages. In addition, writer identification can also be used to determine the identity of a historical actor if he or she leaves a valuable written artefact. The object of this research is the traditional character of the Lampung region which is so-called Had Lampung by the local community. The traditional character of Lampung consists of 20 main characters and 12 diacritics. Based on selected characters, the writer will be recognized using the Principal Component Analysis (PCA) feature. PCA is one linear feature extraction method of an object in pattern recognition. The PCA algorithm consists of several stages, namely the calculation of the average dataset, the subtraction of the vector dataset with averages, the calculation of covariance, the calculation of eigenvectors and eigenvalues, eigenvector reduction, and the projection of the dataset against reduced eigenvector space. PCA in this paper is used as a feature in image recognition. The dataset utilized in this study is the Lampung Dataset which is a handwritten character recognition (HWCR) dataset. Lampung Dataset consists of 82 Lampung handwritten documents. All Lampung character images in the dataset were extracted from these documents using the connected component extraction algorithm and eventually generated 32,140 images. Furthermore, these images are converted into grayscale images. In this research, as many as 12,500 grayscale images of Lampung handwriting characters were chosen to represent 82 different writers. This data is employed as training and testing data on the proposed method. The highest accuracy of the identification of the writer using this PCA feature is 82.92%, while the lowest accuracy is 28.29%.
Keywords