Deep Deterministic Policy Gradient Agents (DDPG at Reinforcement Learning), actor output is oscilating a few times then got stuck on the minimum.

Question

SAMir 2020년 3월 27일

1
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/513368-deep-deterministic-policy-gradient-agents-ddpg-at-reinforcement-learning-actor-output-is-oscilat

편집: Anh Tran 2020년 4월 2일

Hi

I am not experienced on Simulink and RL. I have tried to simulate a very simple scenario to test DDPG before implementing my complex system. The agent is randomly placed around (0,0) and the goal is to move to (500,500) or its nearby.

But it doesn't work for me. The action output (2x1) should be continuous in the range [-2 2]. For the first few episodes, the output oscillates between max and min and then stay on the minimum for the rest of the episodes.

I changed deep network settings as well as RL options but same problem. I have changed the output range and make it (-inf inf) with saturation but still the same. Also, I simulate it for a few thousand episodes but the same problem.

Codes are attached.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Anh Tran 2020년 4월 2일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/513368-deep-deterministic-policy-gradient-agents-ddpg-at-reinforcement-learning-actor-output-is-oscilat#answer_423629

편집: Anh Tran 2020년 4월 2일

MATLAB Online에서 열기

A few points I have identified with your original script

You should include the action bounds when defining action specification. With this information, the DDPG will not automatically adjust the gain for tanh output. It only saturates the output with this range.

actionInfo = rlNumericSpec([numAct 1]), 'LowerLimit', -2, 'UpperLimit', 2);

You can refer to DDPG Pendulum example to see how a scalingLayer is included after tanhLayer to scale the action range from -2 to 2 (same with your case)

The noise model is unstable. To create a stable Ornstein-Uhlenbeck noise model, ensure that abs(1 - MeanAttractionConstant.*SampleTime) is less than or equal to 1.

Due to the above points (unstable noise model, no action bound), I observe Inf action from your original set up (which later propagate to NaN weights in the network). I change MeanAttractionConstant to 1/30 and it works fine.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Answer 2

Emmanouil Tzorakoleftherakis 2020년 3월 30일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/513368-deep-deterministic-policy-gradient-agents-ddpg-at-reinforcement-learning-actor-output-is-oscilat#answer_422955

편집: Emmanouil Tzorakoleftherakis 2020년 3월 30일

Hi Samir,

After reviewing your model, if you check the actions the agent outputs, they blow up to infinity. That should not be possible given that the last layer in your actor is a tanh layer. The problem is actually in the plant dynamics. In some instances the observations that are fed to the agent are NaN, which leads to this behavior.

Hope that helps

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

SAMir 2020년 3월 31일

Many thanks Emmanouil. I think if change the Main.m:line 25(or 29), I can limit action output.e.g:

actionInfo = rlNumericSpec([numAct 1],'LowerLimit',[-5 -5]','UpperLimit', [5 5]');

It looks RL agent adjust a gain for tanh output based on these limits.

Also, I have mointored the observation but couldn't find any NaN. The only NaN is shown for episode Q0. Am I wrong?

댓글을 달려면 로그인하십시오.

Deep Deterministic Policy Gradient Agents (DDPG at Reinforcement Learning), actor output is oscilating a few times then got stuck on the minimum.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

추가 답변 (1개)

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

Deep Deterministic Policy Gradient Agents (DDPG at Reinforcement Learning), actor output is oscilating a few times then got stuck on the minimum.

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

추가 답변 (1개)

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기