Journal of Engineering Science and Technology Review (Mar 2016)
Discovering Clusters of Plagiarism in Students’ Source Codes
Abstract
Plagiarism in students’ source codes constitutes an important drawback for the educational process. In addition, plagiarism detection in source codes is time consuming and tiresome task. Therefore, many approaches for plagiarism detection have been proposed. Most of the aforementioned approaches receive as input a set of source files and calculate a similarity between each pair of the input set. However, the tutor often needs to detect the clusters of plagiarism, i.e. clusters of students’ assignments such as all assignments in a cluster derive from a common original. In this paper, we propose a novel plagiarism detection algorithm that receives as input a set of source codes and calculates the clusters of plagiarism. Experimental results show the efficiency of our approach and encourage us to further research.