IEEE Access (Jan 2019)

Invalidating Analysis Knowledge for Code Virtualization Protection Through Partition Diversity

  • Wei Wang,
  • Meng Li,
  • Zhanyong Tang,
  • Huanting Wang,
  • Guixin Ye,
  • Fuwei Wang,
  • Jie Ren,
  • Xiaoqing Gong,
  • Dingyi Fang,
  • Zheng Wang

DOI
https://doi.org/10.1109/ACCESS.2019.2954165
Journal volume & issue
Vol. 7
pp. 169160 – 169173

Abstract

Read online

To protect programs from unauthorized analysis, virtualize the code based on Virtual Machine (VM) technologies is emerging as a feasible method for accomplishing code obfuscation. However, in some State-of-the-art VM-based protection approaches, the set of virtual instructions and bytecode interpreters are fixed across the whole programs. This means an experienced attacker could extract the mapping information between virtual instructions and native code from programs, and use this knowledge to uncover the mapping relationships in similar protecting applications. To address this problem, we present CoDiver (Code Virtualization Protection with Diversity), a novel VM-based code obfuscation system in this paper. The main idea of our approach is to obfuscate the mapping between the opcodes of bytecode instructions and their semantics. To achieve this goal, we first turn every protected code region into multiple parts by partition proceeding, randomize the mapping of opcodes and their semantics of each part. By this way, we could translate the bytecode instruction into different native code in different sections of the obfuscated code. This method could increase the diversity of program behavior significantly. As a result, it will be useless to learn the mapping relationship between bytecode and native code of some other programs, then migrate it into a new program. We build a prototype of CoDiver and tested it on a set of real-world applications. Experimental results show that as compared with two state-of-the-art VM-based code obfuscation approaches, our approach is more effective and could provide stronger protection with comparable runtime overhead and code size.

Keywords