GLDOC: detection of implicitly malicious MS-Office documents using graph convolutional networks

Wenbo Wang; Peng Yi; Taotao Kou; Weitao Han; Chengyu Wang

doi:10.1186/s42400-024-00243-7

Cybersecurity (Jul 2024)

GLDOC: detection of implicitly malicious MS-Office documents using graph convolutional networks

Wenbo Wang,
Peng Yi,
Taotao Kou,
Weitao Han,
Chengyu Wang

Affiliations

Wenbo Wang: PLA Information Engineering University
Peng Yi: PLA Information Engineering University
Taotao Kou: Shanxi Binhe Research Institute
Weitao Han: PLA Information Engineering University
Chengyu Wang: PLA Information Engineering University

DOI: https://doi.org/10.1186/s42400-024-00243-7
Journal volume & issue: Vol. 7, no. 1
pp. 1 – 14

Abstract

Read online

Abstract Nowadays, the malicious MS-Office document has already become one of the most effective attacking vectors in APT attacks. Though many protection mechanisms are provided, they have been proved easy to bypass, and the existed detection methods show poor performance when facing malicious documents with unknown vulnerabilities or with few malicious behaviors. In this paper, we first introduce the definition of im-documents, to describe those vulnerable documents which show implicitly malicious behaviors and escape most of public antivirus engines. Then we present GLDOC—a GCN based framework that is aimed at effectively detecting im-documents with dynamic analysis, and improving the possible blind spots of past detection methods. Besides the system call which is the only focus in most researches, we capture all dynamic behaviors in sandbox, take the process tree into consideration and reconstruct both of them into graphs. Using each line to learn each graph, GLDOC trains a 2-channel network as well as a classifier to formulate the malicious document detection problem into a graph learning and classification problem. Experiments show that GLDOC has a comprehensive balance of accuracy rate and false alarm rate − 95.33% and 4.33% respectively, outperforming other detection methods. When further testing in a simulated 5-day attacking scenario, our proposed framework still maintains a stable and high detection accuracy on the unknown vulnerabilities.

Published in Cybersecurity

ISSN: 2523-3246 (Online)
Publisher: SpringerOpen
Country of publisher: Singapore
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://cybersecurity.springeropen.com/

About the journal

Abstract

Keywords