Automated deobfuscation and family classification system for Excel 4.0 macros

LI Chenguang; YANG Xiuzhang; PENG Guojun

网络与信息安全学报 (Jun 2024)

Automated deobfuscation and family classification system for Excel 4.0 macros

LI Chenguang,
YANG Xiuzhang,
PENG Guojun

Affiliations

LI Chenguang
YANG Xiuzhang
PENG Guojun

Journal volume & issue: Vol. 10
pp. 66 – 80

Abstract

Read online

In recent years, a surge has been witnessed in cyber-attacks that leverage malicious Excel 4.0 macros (XLM) within documents. Malicious XLM codes often undergo complex obfuscation, posing a substantial challenge for conventional analysis methods and detection systems to discern the actual functionality within a vast array of samples. Consequently, an automated system for deobfuscating XLM and extracting key Indicators of Compromise (IOCs), named XLMRevealer, was developed to counter the diverse obfuscation strategies employed in malicious samples. XLMRevealer was architected upon abstract syntax trees and execution simulation, encompassing 138 comprehensive macro function handlers. Based on that, Word and Token features tailored to XLM code peculiarities were extracted, capturing multi-level, fine-grained features through feature fusion. XLMRevealer incorporated a CNNBiLSTM model to discern familial correlations across dimensions, facilitating family classification. Finally, a dataset comprising 2346 samples from five distinct sources was constructed for both deobfuscation and family classification experiments. Results indicated that XLMRevealer achieved a 71.3% deobfuscation success rate, outperforming XLMMacroDeobfuscator and SYMBEXCEL by 20.8% and 15.8%, respectively. Its efficiency was stable, with an average processing time of only 0.512 seconds. The family classification accuracy for deobfuscated XLM codes stood at 94.88%, surpassing all baseline models and underscoring the efficacy of Word and Token feature integration. Furthermore, to assess the impact of deobfuscation on family classification and account for variability in obfuscation techniques across families, experiments were conducted on both the original and uniformly obfuscated XLM codes. The accuracies were 89.58% and 53.61%, respectively, demonstrating the model's capability to learn obfuscation features and confirming the significant enhancement deobfuscation provides for family classification.

malicious macro document;Excel 4.0 macro;deobfuscation;family classification

Published in 网络与信息安全学报

ISSN: 2096-109X (Print)
Publisher: POSTS&TELECOM PRESS Co., LTD
Country of publisher: China
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://www.infocomm-journal.com/cjnis/CN/2096-109X/home.shtml

About the journal

Abstract

Keywords