Buana Information Technology and Computer Sciences (Jun 2024)
Identification of Socio Economic Registration Data Using OCR Based Tesseract and Google Cloud Vision
Abstract
The Indonesian government program, called Socio-Economic Registration (Regsosek), aims to measure and monitor the socio-economic conditions of low-income people. One of the relevant data used for research is Regsosek. This method is used to analyze the influence of economic and social infrastructure on economic growth, analyze the socio-economic determinants of ownership of work accident insurance for informal workers, create a women's socio-economic vulnerability index (IKSEP), and study intercultural literacy from a social, economic and political perspective. The success of the government's Socio-Economic Registration program depends on the role of data collection officers or surveyors, who directly interact with the community to obtain information about Socio-Economic Registration (Regsosek) data collection. This method also has other obstacles that significantly affect the overall results of the survey, where the survey results must be entered manually by the surveyor from a form with handwritten data, after which it is entered into the website. This method is vulnerable to human error, where the handwriting is difficult to read, and mistakes are made during the data input. The technology that can be used to handle this problem is implementing the OCR method, where writing that was initially handwritten manually can be identified and converted into digital text that can be edited (editable text) and processed automatically. This research shows that the proposed method has good accuracy, with an Accuracy of 96.45%, CER 0.3%, and WER 4.30%.
Keywords