Engineering and Applied Science Research (Dec 2018)
A novel technique for Thai document plagiarism detection using syntactic parse trees
Abstract
The act of plagiarism is a serious offense and all involved parties will be penalized according to most Thai university rules. The lack of effective tools for plagiarism detection in the Thai language is a problem for academic and research institutes in Thailand. A practical framework and detection tool would facilitate the development of academic integrity and honesty. This paper presents an effective alternative method to detect plagiarism in Thai academic articles utilizing a syntactic parse tree technique (SPT). The main concept of this method is the dynamic weighing of each sentence according to the roles of its words. The experimental results, empirically compared with three existing tools: tri-grams, semantic role labeling (SRL), Turnitin and Akarawisut, yield comparable or higher precision and recall in all four plagiarism study cases of word-by-word, word-reordering, modifier-insertion, and synonym-replacement plagiarism. SPT shows promise and should be incorporated in similarity comparison tools to improve the accuracy of plagiarism detection in theThai language.
Keywords