Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) (Feb 2024)

The Design of a C1 Document Data Extraction Application Using a Tesseract-Optical Character Recognition Engine

  • Ircham Aji Nugroho,
  • Bety Hayat Susanti,
  • Mareta Wahyu Ardyani,
  • Nadia Paramita R.A.

DOI
https://doi.org/10.29207/resti.v8i1.5151
Journal volume & issue
Vol. 8, no. 1
pp. 42 – 53

Abstract

Read online

The 2019 election process used the Vote Counting Information System, also known as Sistem Informasi Penghitungan Suara (Situng), to provide transparency in the recapitulation process. The data displayed in Situng is from document C1 for 813,336 voting stations in Indonesia. The data collected from the C1 document is entered and uploaded into Situng by the officers of the Municipal General Election Commission (GEC). Since this process is performed by humans, it is not immune to errors. In the recapitulation process of the 2019 election results, there were 269 data entry errors, and the data entry process also did not run according to the specified target, resulting in delays. Furthermore, there were cases of C1 document modification, raising concerns about the data's authenticity. To avoid human errors and increase data entry speed, automatic data entry is a plausible option. The data entered are text data in image documents with the same template format, so that optical character recognition (OCR) can be used to read the text while improving image quality and alignment, resulting in a more accurate OCR reading area. In this study, we developed a C1 document data extraction application using the waterfall SDLC method, which has undergone a systematic and thorough process. The application was developed using Tesseract optical character recognition. Tesseract is an open-source OCR engine and command-line program that allows for the recognition of text characters within a digital image. The accuracy obtained by using this method is still not optimal as a substitute for Situng's data entry officer. To guarantee the integrity of the C1 document, we use the RSA-2048 digital signature scheme. The use of the Tesseract-OCR Engine for character recognition, combined with digital signature capabilities, provides a comprehensive solution to reduce the human error factor that can lead to miscalculations and inaccurate processes.

Keywords