A Sustainable and Open Access Knowledge Organization Model to Preserve Cultural Heritage and Language Diversity
Amel Fraisse,
Zheng Zhang,
Alex Zhai,
Ronald Jenn,
Shelley Fisher Fishkin,
Pierre Zweigenbaum,
Laurence Favier,
Widad Mustafa El Hadi
Affiliations
Amel Fraisse
Groupe d’Études et de Recherche Interdisciplinaire en Information et Communication (GERiiCO), Université de Lille, 59000 Lille, France
Zheng Zhang
Laboratoire d’Informatique pour la Mécanique et les Sciences de l’Ingénieur-Centre National de la Recherche Scientifique (LIMSI-CNRS), Université Paris-Saclay, 91400 Orsay, France
Alex Zhai
Laboratoire d’Informatique pour la Mécanique et les Sciences de l’Ingénieur-Centre National de la Recherche Scientifique (LIMSI-CNRS), Université Paris-Saclay, 91400 Orsay, France
Ronald Jenn
Centre d’Etudes en Civilisations, Langues et Littératures Etrangères (CECILLE), Université de Lille, 59000 Lille, France
Shelley Fisher Fishkin
Department of English, Stanford University, 94305 California, CA, USA
Pierre Zweigenbaum
Laboratoire d’Informatique pour la Mécanique et les Sciences de l’Ingénieur-Centre National de la Recherche Scientifique (LIMSI-CNRS), Université Paris-Saclay, 91400 Orsay, France
Laurence Favier
Groupe d’Études et de Recherche Interdisciplinaire en Information et Communication (GERiiCO), Université de Lille, 59000 Lille, France
Widad Mustafa El Hadi
Groupe d’Études et de Recherche Interdisciplinaire en Information et Communication (GERiiCO), Université de Lille, 59000 Lille, France
This paper proposes a new collaborative and inclusive model for Knowledge Organization Systems (KOS) for sustaining cultural heritage and language diversity. It is based on contributions of end-users as well as scientific and scholarly communities from across borders, languages, nations, continents, and disciplines. It consists in collecting knowledge about all worldwide translations of one original work and sharing that data through a digital and interactive global knowledge map. Collected translations are processed in order to build multilingual parallel corpora for a large number of under-resourced languages as well as to highlight the transnational circulation of knowledge. Building such corpora is vital in preserving and expanding linguistic and traditional diversity. Our first experiment was conducted on the world-famous and well-traveled American novel Adventures of Huckleberry Finn by the American author Mark Twain. This paper reports on 10 parallel corpora that are now sentence-aligned pairs of English with Basque (an European under-resourced language), Bulgarian, Dutch, Finnish, German, Hungarian, Polish, Portuguese, Russian, and Ukrainian, processed out of 30 collected translations.