IEEE Access (Jan 2023)

Meta-Heuristic Guided Feature Optimization for Enhanced Authorship Attribution in Java Source Code

  • Bilal Al-Ahmad,
  • Nailah Al-Madi,
  • Abdullah Alzaqebah,
  • Rami S. Alkhawaldeh,
  • Khaled Aldebei,
  • Md. Faisal Kabir,
  • Ismail Altaharwa,
  • Mua'ad Abu-Faraj,
  • Ibrahim Aljarah

DOI
https://doi.org/10.1109/ACCESS.2023.3341395
Journal volume & issue
Vol. 11
pp. 141657 – 141673

Abstract

Read online

Source code authorship attribution is the task of identifying who develops the code based on learning based on the programmer style. It is one of the critical activities which used extensively in different aspects such as computer security, computer law, and plagiarism. This paper attempts to investigate source code authorship attribution by capturing natural language aspects of the code rather than only using minimal set of syntactic and stylistic code features as explored in the previous literature. It proposes an evolutionary feature selection model to improve the accuracy of authorship attribution by implementing two language models (uni-gram and bi-gram). The proposed approach uses K-Nearest Neighbor as a classifier and Genetic Algorithm as a feature selection technique. Two experiments have been demonstrated on a public Authorship Attribution dataset on GitHub, the experiments include various evolutionary feature selection models. Notably, the obtained results in both experiments were compared with the related studies, and show a significant improvement in terms of accuracy.

Keywords