AIMS Energy (Mar 2015)

Maximize Producer Rewards in Distributed Windmill Environments: A Q-Learning Approach

  • Bei Li,
  • Siddharth Gangadhar,
  • Pramode Verma,
  • Samuel Cheng

DOI
https://doi.org/10.3934/energy.2015.1.162
Journal volume & issue
Vol. 3, no. 1
pp. 162 – 172

Abstract

Read online

In Smart Grid environments, homes equipped with windmills are encouraged to generate energy and sell it back to utilities. Time of Use pricing and the introduction of storage devices would greatly influence a user in deciding when to sell back energy and how much to sell. Therefore, a study of sequential decision making algorithms that can optimize the total pay off for the user is necessary. In this paper, reinforcement learning is used to tackle this optimization problem. The problem of determining when to sell back energy is formulated as a Markov decision process and the model is learned adaptively using Q-learning. Experiments are done with varying sizes of storage capacities and under periodic energy generation rates of different levels of fluctuations. The results show a notable increase in discounted total rewards from selling back energy with the proposed approach.

Keywords