Reinforcement Learning Toolbox: DDPG Agent, Q0 =0 during the whole training (more than 5000 iterations)

Question

Christian Idzik 2019년 8월 5일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/474916-reinforcement-learning-toolbox-ddpg-agent-q0-0-during-the-whole-training-more-than-5000-iteratio

댓글: Rik 2020년 10월 2일

EpisodeManager_Q0_0.png

I implemented a DDPG Agent in Matlab's Reinforcement Learning Toolbox with a custom enviroment.

At the beginning I used only a few neurons per hidden layer (8-60) and learning rates between 0.1 and 10 for the critic and actor.

But the problem didn't converges, so I increased the number of neurons per hidden layer (300-400) and decreased the learning rate to about 0.0001.

However, the results are better but it don't converge at all.

But I noticed that the Q0 do not change during the training. Maybe that causes some problems.

Q0 is during the whole training 0. Attached you can find the screenshot of the episode manager.

Somehow, Q0 had changed during the training with a 'old' setup (8-60 neurons per hidden layers and learning rate in between 0.1 and 10)

Does anyone have any idea what went wrong?

Does anyone have any tips for me?

Thanks in advance!

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Rik 2020년 10월 2일

Why is this thread such a magnet for spam? 1 caught by the spam filter, and 6 not (I'll delete those now as well).

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Sai Sri Pathuri 2019년 8월 8일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/474916-reinforcement-learning-toolbox-ddpg-agent-q0-0-during-the-whole-training-more-than-5000-iteratio#answer_386744

The problem may not be due to EpisodeQ0. It may be because DDPG agent may not learn anything for some time during the early episodes, and they typically show a dip in cumulative reward early in the training process. They can show signs of learning after the first few thousand episodes.

Go through the following link for tips while configuring

https://in.mathworks.com/help/reinforcement-learning/ug/create-policy-and-value-function-representations.html#mw_c64e5319-105c-4052-96d1-8df734d7d59d

You may refer following link for related answer

https://in.mathworks.com/matlabcentral/answers/461100-reinforcement-learning-toolbox-intialise-experience-buffer

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Christian Idzik 2019년 8월 8일

Thank you very much

댓글을 달려면 로그인하십시오.

Reinforcement Learning Toolbox: DDPG Agent, Q0 =0 during the whole training (more than 5000 iterations)

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

채택된 답변

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Reinforcement Learning Toolbox: DDPG Agent, Q0 =0 during the whole training (more than 5000 iterations)

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

채택된 답변

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기