EURASIP Journal on Image and Video Processing (Aug 2020)

Detection and recognition of cursive text from video frames

  • Ali Mirza,
  • Ossama Zeshan,
  • Muhammad Atif,
  • Imran Siddiqi

DOI
https://doi.org/10.1186/s13640-020-00523-5
Journal volume & issue
Vol. 2020, no. 1
pp. 1 – 19

Abstract

Read online

Abstract Textual content appearing in videos represents an interesting index for semantic retrieval of videos (from archives), generation of alerts (live streams), as well as high level applications like opinion mining and content summarization. The key components of such systems require detection and recognition of textual content which also make the subject of our study. This paper presents a comprehensive framework for detection and recognition of textual content in video frames. More specifically, we target cursive scripts taking Urdu text as a case study. Detection of textual regions in video frames is carried out by fine-tuning deep neural networks based object detectors for the specific case of text detection. Script of the detected textual content is identified using convoluational neural networks (CNNs), while for recognition, we propose a UrduNet, a combination of CNNs and long short- term memory (LSTM) networks. A benchmark dataset containing cursive text with more than 13,000 video frame is also developed. A comprehensive series of experiments is carried out reporting an F-measure of 88.3% for detection while a recognition rate of 87%.

Keywords