Information (Jun 2023)

Multi-Task Romanian Email Classification in a Business Context

  • Alexandru Dima,
  • Stefan Ruseti,
  • Denis Iorga,
  • Cosmin Karl Banica,
  • Mihai Dascalu

DOI
https://doi.org/10.3390/info14060321
Journal volume & issue
Vol. 14, no. 6
p. 321

Abstract

Read online

Email classification systems are essential for handling and organizing the massive flow of communication, especially in a business context. Although many solutions exist, the lack of standardized classification categories limits their applicability. Furthermore, the lack of Romanian language business-oriented public datasets makes the development of such solutions difficult. To this end, we introduce a versatile automated email classification system based on a novel public dataset of 1447 manually annotated Romanian business-oriented emails. Our corpus is annotated with 5 token-related labels, as well as 5 sequence-related classes. We establish a strong baseline using pre-trained Transformer models for token classification and multi-task classification, achieving an F1-score of 0.752 and 0.764, respectively. We publicly release our code together with the dataset of labeled emails.

Keywords