Jisuanji kexue yu tansuo (Sep 2022)

SQL-Detector: SQL Plagiarism Detection Technique Based on Coding Features

  • XU Jia, MO Xiaokun, YU Ge, LYU Pin, WEI Tingting

DOI
https://doi.org/10.3778/j.issn.1673-9418.2103011
Journal volume & issue
Vol. 16, no. 9
pp. 2030 – 2040

Abstract

Read online

Mastering structured query language (SQL) is the key to learn the database technology. However, a lot of teaching practices show that some students may plagiarize when doing SQL exercises. Existing SQL plagiarism detection techniques either detect plagiarized submissions simply by matching the similarities of students' SQL submissions, or identify plagiarism problems by analyzing students' SQL submissions based on the simple coding features displayed in students' SQL codes, which fails to make good use of the rich coding features of students when they write SQL codes to achieve high-accuracy plagiarism detection. In view of this, this paper proposes an SQL plagiarism detection technique based on coding features of students, named SQL-Detector. SQL-Detector first extracts both of the exercise coding features of students for specific SQL exercises and the exercise generalization coding features of students based on their coding habits, so as to profile the students. Then, SQL-Detector identifies the plagiarism group by conducting a clustering analysis over the exercise coding features of all students. Finally, SQL-Detector determines the copiers and givers by comparing the consistency between students' exercise generalization coding features and his/her historical generalization coding features. Experimental evaluation is conducted by using the dataset collected from real classroom practices. Experimental results show that the plagiarism detection accuracy of the SQL-Detector technique is on average 14.0% higher than that of the state-of-the-art technique.

Keywords