IEEE Access (Jan 2024)
Climbing the Hill to Understand the Code
Abstract
Software maintenance takes up a disproportionately large amount of time in the modern software life cycle. One of the common problems is understanding the original code that is being restructured and improved and this is especially true with low-level code. This paper investigates the results and properties of an automated process that can raise the abstraction level of code from low-level operations to high-level structures. The process is made of independent components and can be adapted to different scenarios. The automated improvements implementation relies on the program transformation system FermaT and its catalogue of semantics-preserving transformations. The process uses hill climbing and a metric for the fitness function of the programs. This component was made to work on general inputs, without explicit knowledge of the type of origin of the program. The paper explores how different inputs are actually handled by the system, what are the properties and how these can be used for further improvements. Two main types of inputs are shown, x86 assembly and MicroJava bytecode. These two have many operational differences, and the translator tools introduce some more, but nonetheless, the same process can handle all of these and, on average, improve the Structure metric (a good approximation of the complexity of the code) by around 85%.
Keywords