Proceedings of the XXth Conference of Open Innovations Association FRUCT (Apr 2020)

Duplicate and Plagiarism Search in Program Code Using Suffix Trees Over Compiled Code

  • Igor Andrianov,
  • Svetlana Rzheutskaya,
  • Alexey Sukonschikov,
  • Dmitry Kochkin,
  • Anatoly Shvetsov,
  • Arseny Sorokin

DOI
https://doi.org/10.23919/FRUCT48808.2020.9087465
Journal volume & issue
Vol. 26, no. 1
pp. 16 – 22

Abstract

Read online

The search for duplicate source code allow both to improve the quality of the software being developed and to detect plagiarism. In this paper, it is proposed to use a set of features of modern optimizing compilers to simplify and reduce this task to a search by similarity of text fragments. In this case, many types of cosmetic changes in code do not affect the search result. In order to effectively search by similarity, we use sparse suffix trees built on binary encoded data. Algorithms for constructing such a tree and performing a search are presented. The application of the results to detect cheating in a distance programming workshop is described.

Keywords