IEEE Access (Jan 2023)

Detecting Malicious JavaScript Using Structure-Based Analysis of Graph Representation

  • Muhammad Fakhrur Rozi,
  • Tao Ban,
  • Seiichi Ozawa,
  • Akira Yamada,
  • Takeshi Takahashi,
  • Sangwook Kim,
  • Daisuke Inoue

DOI
https://doi.org/10.1109/ACCESS.2023.3317266
Journal volume & issue
Vol. 11
pp. 102727 – 102745

Abstract

Read online

Malicious JavaScript code in web applications poses a significant threat as cyber attackers exploit it to perform various malicious activities. Detecting these malicious scripts is challenging, given their diverse nature and the continuous evolution of attack techniques. Most approaches formulate this task as a static or sequential feature of the script, which is insufficient in terms of flexibility to various attack techniques and the ability to capture the script’s semantic meaning. To address this issue, we propose an alternative approach that leverages JavaScript code’s abstract syntax tree (AST) representation, focusing on distinctive syntactic structure features. The proposed approach uses graph neural networks to extract structural features from the AST graph while considering the attribute features of individual nodes, which uses neural message passing with neighborhood aggregation. The proposed method encodes both the local AST graph structure and attributes of the nodes. It enables capturing the source code’s semantic meaning and exploits the signature structure in the AST representations. The proposed method consistently achieved high detection performance in extensive experiments on two different datasets, with accuracy scores of 99.4% and 96.92%. The obtained evaluation metrics demonstrate the effectiveness of our approach in accurately detecting malicious JavaScript code, with our proposed method successfully detecting more than 81% for various attack types and achieving an almost twofold performance improvement on JS-Droppers compared to the sequence-based approach. In addition, we observed that the AST graph structure represented the code’s semantic meaning, exhibiting distinctive patterns and signatures that could be effectively captured using the proposed method.

Keywords