Detecting Malicious JavaScript Using Structure-Based Analysis of Graph Representation

Muhammad Fakhrur Rozi; Tao Ban; Seiichi Ozawa; Akira Yamada; Takeshi Takahashi; Sangwook Kim; Daisuke Inoue

doi:10.1109/access.2023.3317266

IEEE Access (Jan 2023)

Detecting Malicious JavaScript Using Structure-Based Analysis of Graph Representation

Muhammad Fakhrur Rozi,
Tao Ban,
Seiichi Ozawa,
Akira Yamada,
Takeshi Takahashi,
Sangwook Kim,
Daisuke Inoue

Affiliations

Muhammad Fakhrur Rozi: ORCiD; National Institute of Information and Communications Technology, Koganei, Japan
Tao Ban: ORCiD; National Institute of Information and Communications Technology, Koganei, Japan
Seiichi Ozawa: ORCiD; Electrical and Electronics Engineering Department, Kobe University, Kobe, Japan
Akira Yamada: ORCiD; Electrical and Electronics Engineering Department, Kobe University, Kobe, Japan
Takeshi Takahashi: ORCiD; National Institute of Information and Communications Technology, Koganei, Japan
Sangwook Kim: Electrical and Electronics Engineering Department, Kobe University, Kobe, Japan
Daisuke Inoue: National Institute of Information and Communications Technology, Koganei, Japan

DOI: https://doi.org/10.1109/access.2023.3317266
Journal volume & issue: Vol. 11
pp. 102727 – 102745

Abstract

Read online

Malicious JavaScript code in web applications poses a significant threat as cyber attackers exploit it to perform various malicious activities. Detecting these malicious scripts is challenging, given their diverse nature and the continuous evolution of attack techniques. Most approaches formulate this task as a static or sequential feature of the script, which is insufficient in terms of flexibility to various attack techniques and the ability to capture the script’s semantic meaning. To address this issue, we propose an alternative approach that leverages JavaScript code’s abstract syntax tree (AST) representation, focusing on distinctive syntactic structure features. The proposed approach uses graph neural networks to extract structural features from the AST graph while considering the attribute features of individual nodes, which uses neural message passing with neighborhood aggregation. The proposed method encodes both the local AST graph structure and attributes of the nodes. It enables capturing the source code’s semantic meaning and exploits the signature structure in the AST representations. The proposed method consistently achieved high detection performance in extensive experiments on two different datasets, with accuracy scores of 99.4% and 96.92%. The obtained evaluation metrics demonstrate the effectiveness of our approach in accurately detecting malicious JavaScript code, with our proposed method successfully detecting more than 81% for various attack types and achieving an almost twofold performance improvement on JS-Droppers compared to the sequence-based approach. In addition, we observed that the AST graph structure represented the code’s semantic meaning, exhibiting distinctive patterns and signatures that could be effectively captured using the proposed method.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords