Bridging Linguistic Gaps: Developing a Greek Text Simplification Dataset

Leonidas Agathos; Andreas Avgoustis; Xristiana Kryelesi; Aikaterini Makridou; Ilias Tzanis; Despoina Mouratidis; Katia Lida Kermanidis; Andreas Kanavos

doi:10.3390/info15080500

Information (Aug 2024)

Bridging Linguistic Gaps: Developing a Greek Text Simplification Dataset

Leonidas Agathos,
Andreas Avgoustis,
Xristiana Kryelesi,
Aikaterini Makridou,
Ilias Tzanis,
Despoina Mouratidis,
Katia Lida Kermanidis,
Andreas Kanavos

Affiliations

Leonidas Agathos: Department of Informatics, Ionian University, 49100 Corfu, Greece
Andreas Avgoustis: Department of Informatics, Ionian University, 49100 Corfu, Greece
Xristiana Kryelesi: Department of Informatics, Ionian University, 49100 Corfu, Greece
Aikaterini Makridou: Department of Informatics, Ionian University, 49100 Corfu, Greece
Ilias Tzanis: Department of Informatics, Ionian University, 49100 Corfu, Greece
Despoina Mouratidis: Department of Informatics, Ionian University, 49100 Corfu, Greece
Katia Lida Kermanidis: Department of Informatics, Ionian University, 49100 Corfu, Greece
Andreas Kanavos: Department of Informatics, Ionian University, 49100 Corfu, Greece

DOI: https://doi.org/10.3390/info15080500
Journal volume & issue: Vol. 15, no. 8
p. 500

Abstract

Read online

Text simplification is crucial in bridging the comprehension gap in today’s information-rich environment. Despite advancements in English text simplification, languages with intricate grammatical structures, such as Greek, often remain under-explored. The complexity of Greek grammar, characterized by its flexible syntactic ordering, presents unique challenges that hinder comprehension for native speakers, learners, tourists, and international students. This paper introduces a comprehensive dataset for Greek text simplification, containing over 7500 sentences across diverse topics such as history, science, and culture, tailored to address these challenges. We outline the methodology for compiling this dataset, including a collection of texts from Greek Wikipedia, their annotation with simplified versions, and the establishment of robust evaluation metrics. Additionally, the paper details the implementation of quality control measures and the application of machine learning techniques to analyze text complexity. Our experimental results demonstrate the dataset’s initial effectiveness and potential in reducing linguistic barriers and enhancing communication, with initial machine learning models showing promising directions for future improvements in classifying text complexity. The development of this dataset marks a significant step toward improving accessibility and comprehension for a broad audience of Greek speakers and learners, fostering a more inclusive society.

Published in Information

ISSN: 2078-2489 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: http://www.mdpi.com/journal/information/

About the journal

Abstract

Keywords