网络与信息安全学报 (Apr 2023)
Binary program taint analysis optimization method based on function summary
Abstract
Taint analysis is a popular software analysis method, which has been widely used in the field of information security.Most of the existing binary program dynamic taint analysis frameworks use instruction-level instrumentation analysis methods, which usually generate huge performance overhead and reduce the program execution efficiency by several times or even dozens of times.This limits taint analysis technology’s wide usage in complex malicious samples and commercial software analysis.An optimization method of taint analysis based on function summary was proposed, to improve the efficiency of taint analysis, reduce the performance loss caused by instruction-level instrumentation analysis, and make taint analysis to be more widely used in software analysis.The taint analysis method based on function summary used function taint propagation rules instead of instruction taint propagation rules to reduce the number of data stream propagation analysis and effectively improve the efficiency of taint analysis.For function summary, the definition of function summary was proposed.And the summary generation algorithms of different function structures were studied.Inside the function, a path-sensitive analysis method was designed for acyclic structures.For cyclic structures, a finite iteration method was designed.Moreover, the two analysis methods were combined to solve the function summary generation of mixed structure functions.Based on this research, a general taint analysis framework called FSTaint was designed and implemented, consisting of a function summary generation module, a data flow recording module, and a taint analysis module.The efficiency of FSTaint was evaluated in the analysis of real APT malicious samples, where the taint analysis efficiency of FSTaint was found to be 7.75 times that of libdft, and the analysis efficiency was higher.In terms of accuracy, FSTaint has more accurate and complete propagation rules than libdft.