Applied Sciences (Sep 2024)
Automatic Era Identification in Classical Arabic Poetry
Abstract
The authenticity of classical Arabic poetry has long been challenged by claims that some part of the pre-Islamic poetic heritage should not be attributed to this era. According to these assertions, some of this legacy was produced after the advent of Islam and ascribed, for different reasons, to pre-Islamic poets. As pre-Islamic poets were illiterate, medieval Arabic literature devotees relied on Bedouin oral transmission when writing down and collecting the poems about two centuries later. This process left the identity of the real poets who composed these poems and the period in which they worked unresolved. In this work, we seek to answer the questions of how and to what extent we can identify the period in which classical Arabic poetry was composed, where we exploit modern-day automatic text processing techniques for this aim. We consider a dataset of Arabic poetry collected from the diwans (‘collections of poems’) of thirteen Arabic poets that corresponds to two main eras: the pre-ʿAbbāsid era (covering the period between the 6th and the 8th centuries CE) and the ʿAbbāsid era (starting in the year 750 CE). Some poems in each diwan are considered ‘original’; i.e., poems that are attributed to a certain poet with high confidence. The diwans also include, however, an additional section of poems that are attributed to a poet with reservations, meaning that these poems might have been composed by another poet and/or in another period. We trained a set of machine learning algorithms (classifiers) in order to explore the potential of machine learning techniques to automatically identify the period in which a poem had been written. In the training phase, we represent each poem using various types of features (characteristics) designed to capture lexical, topical, and stylistic aspects of this poetry. By training and assessing automatic models of period prediction using the ‘original’ poetry, we obtained highly encouraging results, measuring between 0.73–0.90 in terms of F1 for the various periods. Moreover, we observe that the stylistic features, which pertain to elements that characterize Arabic poetry, as well as the other feature types, are all indicative of the period in which the poem had been written. We applied the resulting prediction models to poems for which the authorship period is under dispute (‘attributed’) and got interesting results, suggesting that some of the poems may belong to different eras—an issue to be further examined by Arabic poetry researchers. The resulting prediction models may be applied to poems for which the authorship period is under dispute. We demonstrate this research direction, presenting some interesting anecdotal results.
Keywords