Measuring and Improving the Energy Efficiency of Large Language Models Inference

Mauricio Fadel Argerich; Marta Patino-Martinez

doi:10.1109/ACCESS.2024.3409745

IEEE Access (Jan 2024)

Measuring and Improving the Energy Efficiency of Large Language Models Inference

Mauricio Fadel Argerich,
Marta Patino-Martinez

Affiliations

Mauricio Fadel Argerich: ORCiD; Escuela Técnica Superior de Ingenieros Informáticos, Universidad Politécnica de Madrid, Madrid, Spain
Marta Patino-Martinez: Escuela Técnica Superior de Ingenieros Informáticos, Universidad Politécnica de Madrid, Madrid, Spain

DOI: https://doi.org/10.1109/ACCESS.2024.3409745
Journal volume & issue: Vol. 12
pp. 80194 – 80207

Abstract

Read online

Recent improvements in the accuracy of machine learning (ML) models in the language domain have propelled their use in a multitude of products and services, touching millions of lives daily. These new levels of accuracy have been attained mainly through exponential growth in model size, creating a new category of models known as Large Language Models (LLMs) and leading to a substantial increase in computing and energy demands. While recent studies have focused on measuring and improving the energy consumption of LLMs during training, inference has received little attention. In this article, we present an approach to profile the energy consumption of LLMs during inference and leverage it to improve energy efficiency. For this, we deploy several state-of-the-art LLMs and observe how model size, number of layers, parallelized attention, and even vocabulary size affect their energy consumption. In addition, we leverage input batch size and different quantization levels to optimize their inference energy efficiency and latency.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords