The reward gets stuck on a single value during training or randomly fluctuates (Reinforcement Learning)

Question

Vasiliy Polushkin 2020년 4월 28일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/521467-the-reward-gets-stuck-on-a-single-value-during-training-or-randomly-fluctuates-reinforcement-learni

편집: Ari Biswas 2021년 6월 13일

I train the reinforcement learning system, and on the reward plot I have some failures during which the reward does not change. This doesn’t look normal, especially when compared with examples (Biped Robot, etc.) I believe that some rlDDPGAgentOptions settings are responsible for this, but it seems that I changed all the possible settings, but even after several thousand episodes, the system does not learn. What can be the reason for this behavior of this graph during training?

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Ari Biswas 2020년 5월 5일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/521467-the-reward-gets-stuck-on-a-single-value-during-training-or-randomly-fluctuates-reinforcement-learni#answer_430438

It could mean that the training is experiencing a local minima. You can try out a few things:

1. Change the OU noise options to favor more exploration so that the robot can explore more states and get new rewards.

2. Design a different reward function that is not too dependent on sparse rewards. From the graph (flatlines) it looks like you have a sparse reward for a state that the agent is continuously visiting.

In most cases, designing better reward functions will improve training. That being said, 350 episodes might be too early to expect good results. I would let it run for a few 1000 episodes at least before coming to a conclusion that something needs to change.

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기

Abd Al-Rahman Al-Remal 2021년 6월 12일

Hi,

When you say to change the noise options to favour more exploration: how would this be implemented? i.e what parameters should be changed and in what manner?

My case is slightly different than OP's however as my agent just stays at the same reward value consistently (I've never tested it for more than 100 episodes or so however).

Many thanks!

Ari Biswas 2021년 6월 13일

편집: Ari Biswas 2021년 6월 13일

For a DDPG agent you can tune the StandardDeviation and StandardDeviationDecayRate parameters. Please see the documentation for instructions.

https://www.mathworks.com/help/reinforcement-learning/ref/rlddpgagentoptions.html#mw_2875b71d-bfb0-4be4-b0d3-a44592c3cb30_head

댓글을 달려면 로그인하십시오.

The reward gets stuck on a single value during training or randomly fluctuates (Reinforcement Learning)

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

The reward gets stuck on a single value during training or randomly fluctuates (Reinforcement Learning)

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 4 이전 댓글 2개 표시이전 댓글 2개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기