Action Clipping and Scaling in TD3 in Reinforcement Learning

Question

laha_M 2020년 12월 10일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/689924-action-clipping-and-scaling-in-td3-in-reinforcement-learning

댓글: laha_M 2020년 12월 13일

Hello,

I am trying to tune my TD3 agent to solve my custom environment. The environment has two actions in the following range: the first one in [0 10] and the second one in [0 2*PI) (rlNumericSpace).

I am following this example architecture---

https://in.mathworks.com/help/reinforcement-learning/ug/train-td3-agent-for-pmsm-control.html

Now I have the following questions.

Since tanh is [-1 1], should I use the scaling layer at the actor network's end? maybe with the following values

scalingLayer('Name','ActorScaling1','Scale',[5;pi],'Bias',[5;pi])];

2. How to setup Exploration noise and Target policy noise? I mean, what should be their variance values? Well, not precisely tuned, but a competent range given I have more than one action and the provided action range is not in [-1 1] ?

3. How do I clip those values to fit inside the action bound? I dont see any such option in rlTD3AgentOptions

I see all the TD3 examples (and most RL examples in general) action's range is b/n [-1 1]. I am confused about modifying the parameters when the action space is not within [-1 1], like in my case.

Thanks.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Emmanouil Tzorakoleftherakis 2020년 12월 11일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/689924-action-clipping-and-scaling-in-td3-in-reinforcement-learning#answer_572795

Hello,

In general, for DDPG and TD3, it is good practice to include the scalingLayer as the last actor layer to scale/shift the actor actions within desired range.

To your questions:

1) You should use the scalingLayer yes. To specify different scale/bias values for your two outputs, have a look at this example.

2) This section provides some tips on how to set up exploration variance, e.g. "It is common to have Variance*sqrt(Ts) be between 1% and 10% of your action range".

3) The upper and lower limit options in rlNumericSpec as well as the scalingLayer will ensure your actions are within desired range before exploration noise is added. After adding noise however, it is possible that your actions will go out of range which is also why it's often necessary to account for that on the environment side. If you are using Simulink, add for example a saturation block. In MATLAB add an if statement and clip the actions if out of range.

Hope that helps

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

Emmanouil Tzorakoleftherakis 2020년 12월 12일

In the step function yes. You can just add an if statemeng, or use "max" or "min"

laha_M 2020년 12월 13일

ok thanks.

댓글을 달려면 로그인하십시오.

Action Clipping and Scaling in TD3 in Reinforcement Learning

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Action Clipping and Scaling in TD3 in Reinforcement Learning

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 3 이전 댓글 1개 표시이전 댓글 1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기