The Effects of Traditional Anti-Virus Labels on Malware Detection Using Dynamic Runtime Opcodes

Domhnall Carlin; Alexandra Cowan; Philip O'Kane; Sakir Sezer

doi:10.1109/ACCESS.2017.2749538

IEEE Access (Jan 2017)

The Effects of Traditional Anti-Virus Labels on Malware Detection Using Dynamic Runtime Opcodes

Domhnall Carlin,
Alexandra Cowan,
Philip O'Kane,
Sakir Sezer

Affiliations

Domhnall Carlin: ORCiD; Centre for Secure Information Technologies, Queen’s University Belfast, Belfast, U.K.
Alexandra Cowan: Institute of Electronics, Communications and Information Technology, Queen’s University Belfast, Belfast, U.K.
Philip O'Kane: Centre for Secure Information Technologies, Queen’s University Belfast, Belfast, U.K.
Sakir Sezer: Centre for Secure Information Technologies, Queen’s University Belfast, Belfast, U.K.

DOI: https://doi.org/10.1109/ACCESS.2017.2749538
Journal volume & issue: Vol. 5
pp. 17742 – 17752

Abstract

Read online

The arms race between the distributors of malware and those seeking to provide defenses has so far favored the former. Signature detection methods have been unable to cope with the onslaught of new binaries aided by rapidly developing obfuscation techniques. Recent research has focused on the analysis of low-level opcodes, both static and dynamic, as a way to detect malware. Although sometimes successful at detecting malware, static analysis still fails to unravel obfuscated code, whereas dynamic analysis can allow researchers to investigate the revealed code at runtime. Research in the field has been limited by the underpinning data sets; old and inadequately sampled malware can lessen the extrapolation potential of such data sets. The main contribution of this paper is the creation of a new parsed runtime trace data set of over 100 000 labeled samples, which will address these shortcomings, and we offer the data set itself for use by the wider research community. This data set underpins the examination of the run traces using classifiers on count-based and sequence-based data. We find that malware detection rates are lessened when samples are labeled with traditional anti-virus (AV) labels. Neither count-based nor sequence-based algorithms can sufficiently distinguish between AV label classes. Detection increases when malware is re-classed with labels yielded from unsupervised learning. With sequenced-based learning, detection exceeds that of labeling as simply “malware”alone. This approach may yield future work, where the triaging of malware can be more effective.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords