Instruction2vec: Efficient Preprocessor of Assembly Code to Detect Software Weakness with CNN

Yongjun Lee; Hyun Kwon; Sang-Hoon Choi; Seung-Ho Lim; Sung  Hoon Baek; Ki-Woong Park

doi:10.3390/app9194086

Applied Sciences (Sep 2019)

Instruction2vec: Efficient Preprocessor of Assembly Code to Detect Software Weakness with CNN

Yongjun Lee,
Hyun Kwon,
Sang-Hoon Choi,
Seung-Ho Lim,
Sung Hoon Baek,
Ki-Woong Park

Affiliations

Yongjun Lee: Information Security at Graduate School of Information Security, Korea University, Seoul 02841, Korea
Hyun Kwon: School of Computing, Korea Advanced Institute of Science and Technology, Daejeon 34141, Korea
Sang-Hoon Choi: Department of Computer and Information Security, Sejong University, 209, Neungdong-ro, Gwangjin-gu, Seoul 05006, Korea
Seung-Ho Lim: Division of Computer and Electronic Systems Engineering, Hankuk University of Foreign Studies, Seoul 02450, Korea
Sung Hoon Baek: Department of Computer System Engineering, Jungwon University, Chungcheongbuk-do 28024, Korea
Ki-Woong Park: Department of Computer and Information Security, Sejong University, 209, Neungdong-ro, Gwangjin-gu, Seoul 05006, Korea

DOI: https://doi.org/10.3390/app9194086
Journal volume & issue: Vol. 9, no. 19
p. 4086

Abstract

Read online

Potential software weakness, which can lead to exploitable security vulnerabilities, continues to pose a risk to computer systems. According to Common Vulnerability and Exposures, 14,714 vulnerabilities were reported in 2017, more than twice the number reported in 2016. Automated vulnerability detection was recommended to efficiently detect vulnerabilities. Among detection techniques, static binary analysis detects software weakness based on existing patterns. In addition, it is based on existing patterns or rules, making it difficult to add and patch new rules whenever an unknown vulnerability is encountered. To overcome this limitation, we propose a new method—Instruction2vec—an improved static binary analysis technique using machine. Our framework consists of two steps: (1) it models assembly code efficiently using Instruction2vec, based on Word2vec; and (2) it learns the features of software weakness code using the feature extraction of Text-CNN without creating patterns or rules and detects new software weakness. We compared the preprocessing performance of three frameworks—Instruction2vec, Word2vec, and Binary2img—to assess the efficiency of Instruction2vec. We used the Juliet Test Suite, particularly the part related to Common Weakness Enumeration(CWE)-121, for training and Securely Taking On New Executable Software of Uncertain Provenance (STONESOUP) for testing. Experimental results show that the proposed scheme can detect software vulnerabilities with an accuracy of 91% of the assembly code.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords