IEEE Access (Jan 2020)

Abstract Syntax Tree Based Source Code Antiplagiarism System for Large Projects Set

  • Michal Duracik,
  • Patrik Hrkut,
  • Emil Krsak,
  • Stefan Toth

DOI
https://doi.org/10.1109/ACCESS.2020.3026422
Journal volume & issue
Vol. 8
pp. 175347 – 175359

Abstract

Read online

The paper deals with the issue of detecting plagiarism in source code, which we unfortunately encounter when teaching subjects dealing with programming and software development. Many students want to simplify the completion of the course and therefore submit modified source codes of their classmates or even those found on the Internet. Some try to modify the source code e.g. by changing the identifiers of classes, methods and variables to different ones, by changing the corresponding loops, by introducing new methods or by changing the order of methods in the source code or in other ways. We focused directly on this problem and designed our own anti-plagiarism system that we describe in this paper. The designed system consists of three parts during which the source code is processed using six designed algorithms. The basis is the processing of the source code and its transformation into an abstract syntax tree, consisting of two types of nodes, which is then vectorized using our modified DECKARD algorithm. The vectors are then clustered and stored in a database from which similar parts of the source code can be searched. The output of the system is then the final report containing a list of matches with similarities of all works that have been added to the database until then. The designed anti-plagiarism system is finally compared with the success of plagiarism detection performed by the two most used anti-plagiarism tools, namely JPlag and MOSS. It is evaluated on assignments elaborated by students from the courses dealing with object-oriented programming at our faculty.

Keywords