RL DDPG isn't learning

Emmanuel Swetala

2021 4월 24

0 답변

조회 수: 5 (30일)

0 개 추천

Hello,

I am trying to train a model to optimise the power generation units. I have six gen units each with different operation cost, on top of that I have a load profile which has to be met by sum of power contributed from each unit. Ofcourse the power deviation is allowed to be in a range of +/- 0.05.

Here is how I have modelled,

I use the RL agent's actions (six actions) as the power per unit, I multiply each one by the gain (base value) to get power in MW, then I use each MW unit as input to the respective operation cost function (quadratic in nature). I use the sum of power minus the load profile (it's magnitude in per unit and its integral), and the load profile as observations to an RL agent. The reward I define as:- r1 = negative square of deviation r2 = positive 2 if the deviation is within +/- 0.05 r3 = negative sum of operation costs Reward = r1+r2+r3

I used DDPG, with learning 0.0001 and 0.001 for actor and critic respectively, Sample time, 0.4 and simulation time 24, experience buffer 1e6 and mini batch size 256.

The training is run for 15000 episodes but doesn't converge to the expected range of power deviation of +/- 0.05 Also one thing observed is the plot of Q0 vs episode number is increasing indefinitely.

What might be a problem? Kindly consider am a beginner in this area, Thank you