Applied Artificial Intelligence (Jul 2018)
Automatic Bilingual Dictionary Construction for Tirukural
Abstract
The Tirukural is a classic Tamil Sangam literature authored by Thiruvalluvar. Tirukural comprises of 1130 kuratpas. It has been translated into 37 world languages. This necessitates the cross-lingual access of Tirukural on the World Wide Web for which a bilingual dictionary is the primary knowledge base (KB). This KB needs to be constructed. This article puts forth a methodology for automatic construction of a bilingual dictionary for Tirukural in two different languages: Tamil and English. The proposed methodology makes use of the English and Tamil explanatory texts of Tirukural. Explanatory texts in English written by G.U. Pope and that in Tamil by Dr Varadharajan and Dr Solomon Pappaiya are considered in this work. A three-layered model is built using Tirukural and its explanations. Naive Bayes probabilistic learning is used to learn the best mappings between the Tamil and English words. The proposed methodology has been tested with all the 1330 Tamil kuratpas. An efficiency of 70% has been achieved and a performance comparison has been done by using different types of English and Tamil explanatory texts. This method can further be extended to build bilingual dictionaries for other Tamil literatures.