Computers (Jan 2024)

Error Pattern Discovery in Spellchecking Using Multi-Class Confusion Matrix Analysis for the Croatian Language

  • Gordan Gledec,
  • Mladen Sokele,
  • Marko Horvat,
  • Miljenko Mikuc

DOI
https://doi.org/10.3390/computers13020039
Journal volume & issue
Vol. 13, no. 2
p. 39

Abstract

Read online

This paper introduces a novel approach to the creation and application of confusion matrices for error pattern discovery in spellchecking for the Croatian language. The experimental dataset has been derived from a corpus of mistyped words and user corrections collected since 2008 using the Croatian spellchecker available at ispravi.me. The important role of confusion matrices in enhancing the precision of spellcheckers, particularly within the diverse linguistic context of the Croatian language, is investigated. Common causes of spelling errors, emphasizing the challenges posed by diacritic usage, have been identified and analyzed. This research contributes to the advancement of spellchecking technologies and provides a more comprehensive understanding of linguistic details, particularly in languages with diacritic-rich orthographies, like Croatian. The presented user-data-driven approach demonstrates the potential for custom spellchecking solutions, especially considering the ever-changing dynamics of language use in digital communication.

Keywords