Learning-Based Online QoE Optimization in Multi-Agent Video Streaming

Yimeng Wang; Mridul Agarwal; Tian Lan; Vaneet Aggarwal

doi:10.3390/a15070227

Algorithms (Jun 2022)

Learning-Based Online QoE Optimization in Multi-Agent Video Streaming

Yimeng Wang,
Mridul Agarwal,
Tian Lan,
Vaneet Aggarwal

Affiliations

Yimeng Wang: Department of Electrical and Computer Engineering, George Washington University, Washington, DC 20052, USA
Mridul Agarwal: School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907, USA
Tian Lan: Department of Electrical and Computer Engineering, George Washington University, Washington, DC 20052, USA
Vaneet Aggarwal: School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907, USA

DOI: https://doi.org/10.3390/a15070227
Journal volume & issue: Vol. 15, no. 7
p. 227

Abstract

Read online

Video streaming has become a major usage scenario for the Internet. The growing popularity of new applications, such as 4K and 360-degree videos, mandates that network resources must be carefully apportioned among different users in order to achieve the optimal Quality of Experience (QoE) and fairness objectives. This results in a challenging online optimization problem, as networks grow increasingly complex and the relevant QoE objectives are often nonlinear functions. Recently, data-driven approaches, deep Reinforcement Learning (RL) in particular, have been successfully applied to network optimization problems by modeling them as Markov decision processes. However, existing RL algorithms involving multiple agents fail to address nonlinear objective functions on different agents’ rewards. To this end, we leverage MAPG-finite, a policy gradient algorithm designed for multi-agent learning problems with nonlinear objectives. It allows us to optimize bandwidth distributions among multiple agents and to maximize QoE and fairness objectives on video streaming rewards. Implementing the proposed algorithm, we compare the MAPG-finite strategy with a number of baselines, including static, adaptive, and single-agent learning policies. The numerical results show that MAPG-finite significantly outperforms the baseline strategies with respect to different objective functions and in various settings, including both constant and adaptive bitrate videos. Specifically, our MAPG-finite algorithm maximizes QoE by 15.27% and maximizes fairness by 22.47% compared to the standard SARSA algorithm for a 2000 KB/s link.

Published in Algorithms

ISSN: 1999-4893 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.mdpi.com/journal/algorithms

About the journal

Abstract

Keywords