Reinforcement learning actions using DDPG

Question

Jason Smith 2020년 11월 2일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/634229-reinforcement-learning-actions-using-ddpg

댓글: Jason Smith 2020년 11월 12일

Greetings. I'm Jason and I'm working on controlling a bipedal using reinforcement learning. I need help to decide between the two methods below using DDPG:

1_ Generate random actions with Noise variance of %10 of my action range based on descriptions of the DDPG noise model

2_ Using a low variance like 0.5 as they have used in have used in MSRA biped and humanoid training with RL.

I really appreciate it if you could help me with this. And in the latter case, the actions are the output of a tanh layer with low variance([-1.5 1.5]), how is it converted into desired actions?

Please consider that I'm pretty sure that the range of actions I have calculated is good enough to solve the problem and also I tried using higher variances but it makes the learning process less stable. Any sugguestions on how I should generate the random actions?

Thanks in advance for your time and consideration

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Emmanouil Tzorakoleftherakis 2020년 11월 11일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/634229-reinforcement-learning-actions-using-ddpg#answer_541358

Hi Jason,

In the documentation link you provided it's mentioned "Variance*sqrt(Ts) be between 1% and 10% of your action range". The biped example you ar elinking to has Ts = 0.025 and Variance = 0.1 which is about 1% of action range.

To your second question, please have a look at step 1 here. Effectively, during training only, random noise sampled using the noise options you provide will be added to the normal output of your agent. So if your last layer is a tanh layer, you will first get a value in [-1,1] and noise will be added on top of that.

Hope that helps.

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Jason Smith 2020년 11월 12일

Thanks a lot, this really helped

댓글을 달려면 로그인하십시오.

Reinforcement learning actions using DDPG

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

Community Treasure Hunt

Reinforcement learning actions using DDPG

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기