Identifying Compiler and Optimization Level in Binary Code From Multiple Architectures

Davide Pizzolotto; Katsuro Inoue

doi:10.1109/ACCESS.2021.3132950

IEEE Access (Jan 2021)

Identifying Compiler and Optimization Level in Binary Code From Multiple Architectures

Davide Pizzolotto,
Katsuro Inoue

Affiliations

Davide Pizzolotto: ORCiD; Graduate School of Information Science and Technology, Osaka University, Osaka, Japan
Katsuro Inoue: ORCiD; Graduate School of Information Science and Technology, Osaka University, Osaka, Japan

DOI: https://doi.org/10.1109/ACCESS.2021.3132950
Journal volume & issue: Vol. 9
pp. 163461 – 163475

Abstract

Read online

While compiling a native application, different compiler flags or optimization levels can be configured. This choice depends on the different requirements. For example, if the application binary is intended for final release, the flags and optimization settings should be set for execution speed and efficiency. Alternatively, if the application is to be used for debugging purposes, debug flags should be configured accordingly, usually involving minor or no code optimization. However, this information cannot be easily extracted from a compiled binary. Nonetheless, ensuring the same compiler and compilation flags is particularly important when comparing different binary files, to avoid inaccurate or unreliable analyses. Unfortunately, to understand which flags and optimizations have been used, a deep knowledge of the target architecture and the compiler used is required. In this study, we present two deep learning models used to detect both compiler and optimization level in a compiled binary. The optimization levels we study are O0, O1, O2, O3, and Os in the x86_64, AArch64, RISC-V, SPARC, PowerPC, MIPS, and ARM architectures. In addition, for the x86_64 and AArch64 architectures, we also determine whether the compiler is GCC or Clang. We created a dataset of more than 76000 binaries and used it for training. Our experiments showed over 99.95% accuracy in detecting the compiler and between 92% to 98%, depending on the architecture, in detecting the optimization level. Furthermore, we analyzed the change in accuracy when the amount of data was extremely limited. Our study shows that it is possible to accurately detect both compiler flag settings and optimization levels with function-level granularity.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords