Parallel reinforcement learning in separate runs leads to strange learning curve

Question

Mirjan Heubaum 2021년 12월 23일

1
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1616425-parallel-reinforcement-learning-in-separate-runs-leads-to-strange-learning-curve

편집: Mirjan Heubaum 2021년 12월 23일

I'm running training of a DDPG reinforcement learning agent using a HPC cluster node and parallel computing toolbox for only 400 episodes due to some errors I experienced before when running it for much more episodes. Then I save the agent including the experience buffer and repeat the training in a loop. I start the training with

agent.AgentOptions.ResetExperienceBufferBeforeTraining = false;
agent.AgentOptions.SaveExperienceBufferWithAgent=true;
trainingStats = train(agent,env,trainOpts);

and save the agent with

agent.AgentOptions.SaveExperienceBufferWithAgent=true
save(filename, 'agent', '-v7.3');

I can see the experience buffer growing since

agent.ExperienceBuffer.Length

becomes larger. I use

load(PRE_TRAINED_MODEL_FILE,'agent');
agent.AgentOptions.NoiseOptions.Variance = [1200;400;2;1000].*exp(pastepisodes*log(1-agentOpts.NoiseOptions.VarianceDecayRate));

to get the noise variance decay I would expect when using only one training run. The learning rate for the critic is 5e-03 and for the actor 1e-03.

The result is a learning curve I wouldn't expect. I think the curve looks like either the noise variance is reset on each run, or the ExperienceBuffer from the last runs is not being used. The reward should reach approx. 1500.