Труды Института системного программирования РАН (Oct 2018)
Software deobfuscation methods: analysis and implementation
Abstract
This paper describes the work on development of the deobfuscation software. The main target of the developed software is the analysis of the obfuscated malware code. The need of this analysis comes from the obfuscation techniques being widely used for protecting implementations. The regular disassembly tool mostly used by an analyst transforms a binary code in a human-readable form but doesn’t simplify the result or verify its correctness. Earlier for this task it was enough to apply pattern-matching cleanup of the inserted useless garbage code, but nowadays obfuscation techniques are getting more complicated thus requiring more complex methods of code analysis and simplification. As deobfuscation methods require analysis and transformation algorithms similar to those of an optimizing compiler, we have evaluated using LLVM compiler infrastructure as a basis for deobfuscation software. The difference from the compiler is that the deobfuscation algorithms do not have the full information about the program being analyzed, but rather a small part of it. The evaluation results show that using LLVM directly does not remove all the artifacts from the obfuscated code, so to provide the cleaner output it is desirable to develop an independent tool. Nevertheless, using LLVM or similar compiler infrastructure is the feasible approach for developing deobfuscation software.