Journal of Applied Engineering and Technological Science (Sep 2022)

Implementation of OCR (Optical Character Recognition) Using Tesseract in Detecting Character in Quotes Text Images

  • Ikha Novie Tri Lestari,
  • Dadang Iskandar Mulyana

DOI
https://doi.org/10.37385/jaets.v4i1.905
Journal volume & issue
Vol. 4, no. 1

Abstract

Read online

The development of technology in Indonesia is currently increasingly advanced in people's lives and cannot be avoided. The use of Artificial Intelligence in helping humans in dealing with problems is growing. Humans can take advantage of computer/smartphone media in today's technological era. One of its uses is Optical Character Recognition. This research is motivated by the problem where the running system requires development in terms of technology to detect characters in the quote text image, because the previous system still performs manual input. Optical Character Recognition has been widely used to extract characters contained in digital image media. The ability of OCR methods and techniques is very dependent on the normalization process as an initial process before entering into the next stages such as segmentation and identification. The image normalization process aims to obtain a better input image so that the segmentation and identification process can produce optimal accuracy. To get maximum results, it takes several pre-processing stages on the image to be used. To achieve this, it is necessary to perform Optical Character Recognition which can be done using Tesseract-OCR. The OCR program that was created was successfully used to scan or scan a quote text image if the document was lost or damaged, and it could save time for creating, processing and typing documents.

Keywords