Parallel reinforcement learning in separate runs leads to strange learning curve

조회 수: 1 (최근 30일)
I'm running training of a DDPG reinforcement learning agent using a HPC cluster node and parallel computing toolbox for only 400 episodes due to some errors I experienced before when running it for much more episodes. Then I save the agent including the experience buffer and repeat the training in a loop. I start the training with
agent.AgentOptions.ResetExperienceBufferBeforeTraining = false;
agent.AgentOptions.SaveExperienceBufferWithAgent=true;
trainingStats = train(agent,env,trainOpts);
and save the agent with
agent.AgentOptions.SaveExperienceBufferWithAgent=true
save(filename, 'agent', '-v7.3');
I can see the experience buffer growing since
agent.ExperienceBuffer.Length
becomes larger. I use
load(PRE_TRAINED_MODEL_FILE,'agent');
agent.AgentOptions.NoiseOptions.Variance = [1200;400;2;1000].*exp(pastepisodes*log(1-agentOpts.NoiseOptions.VarianceDecayRate));
to get the noise variance decay I would expect when using only one training run. The learning rate for the critic is 5e-03 and for the actor 1e-03.
The result is a learning curve I wouldn't expect. I think the curve looks like either the noise variance is reset on each run, or the ExperienceBuffer from the last runs is not being used. The reward should reach approx. 1500.
Does anybody has an idea why the curve looks like this? Do you have an advice on how to adjust the hyperparameters?

답변 (0개)

카테고리

Help CenterFile Exchange에서 Reinforcement Learning에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by