Selecting the Best Compiler Optimization by Adopting Natural Language Processing

Hameeza Ahmed; Muhammad Fahim Ul Haque; Hashim Raza Khan; Ghalib Nadeem; Kamran Arshad; Khaled Assaleh; Paulo Cesar Santos

doi:10.1109/ACCESS.2024.3451516

IEEE Access (Jan 2024)

Selecting the Best Compiler Optimization by Adopting Natural Language Processing

Hameeza Ahmed,
Muhammad Fahim Ul Haque,
Hashim Raza Khan,
Ghalib Nadeem,
Kamran Arshad,
Khaled Assaleh,
Paulo Cesar Santos

Affiliations

Hameeza Ahmed: ORCiD; Department of Computer and Information Systems Engineering, NED University of Engineering and Technology, Karachi, Pakistan
Muhammad Fahim Ul Haque: ORCiD; Department of Telecommunications Engineering, NED University of Engineering and Technology, Karachi, Pakistan
Hashim Raza Khan: ORCiD; Department of Engineering Sciences and Technology, Iqra University, Karachi, Pakistan
Ghalib Nadeem: ORCiD; Department of Engineering Sciences and Technology, Iqra University, Karachi, Pakistan
Kamran Arshad: ORCiD; Department of Electrical and Computer Engineering, College of Engineering and Information Technology, Ajman University, Ajman, United Arab Emirates
Khaled Assaleh: ORCiD; Department of Electrical and Computer Engineering, College of Engineering and Information Technology, Ajman University, Ajman, United Arab Emirates
Paulo Cesar Santos: ORCiD; Institute of Informatics, Federal University of Rio Grande do Sul, Porto Alegre, Brazil

DOI: https://doi.org/10.1109/ACCESS.2024.3451516
Journal volume & issue: Vol. 12
pp. 121700 – 121711

Abstract

Read online

Compiler is a tool that converts the high-level language into assembly code after enabling relevant optimizations. The automatic selection of suitable optimizations from an ample optimization space is a non-trivial task mainly accomplished through hardware profiling and application-level features. These features are then passed through an intelligent algorithm to predict the desired optimizations. However, collecting these features requires executing the application beforehand, which involves high overheads. With the evolution of Natural Language Processing (NLP), the performance of an application can be solely predicted at compile time via source code analysis. There has been substantial work in source code analysis using NLP, but most of it is focused on offloading the computation to suitable devices or detecting code vulnerabilities. Therefore, it has yet to be used to identify the best optimization sequence for an application. Similarly, most works have focused on finding the best machine learning or deep learning algorithms, hence ignoring the other important phases of the NLP pipeline. This paper pioneers the use of NLP to predict the best set of optimizations for a given application at compile time. Furthermore, this paper uniquely studies the impact of four vectorization and seven regression techniques in predicting the application performance. For most applications, we show that tfidf vectorization and huber regression result in the best outcomes. On average, the proposed technique predicts the optimal optimization sequence with a performance drop of 18%, achieving a minimum drop of merely 0.5% compared to the actual best combination.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords