Training a DDPG, and observation values are zero. How do I initialize the first episode to have initial values to the action?

조회 수: 4 (최근 30일)

Bay Jay 2023년 6월 21일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1986289-training-a-ddpg-and-observation-values-are-zero-how-do-i-initialize-the-first-episode-to-have-init

댓글: Emmanouil Tzorakoleftherakis 2023년 7월 5일

Hello,

I am training a DDPG agent with four actions. My observations are zero for more than 1000 episodes. I suspect because the action values have been zero, that is affecting the observations. How do I set the action values for the first episode to some values at start.

Actions are torque input with min and max (200) and later multiplied with gain 100. Is there something, I need to do to properly to get the observations to not stay as zeros.

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기

Bay Jay 2023년 7월 2일

Hello @Emmanouil Tzorakoleftherakis

I have a followup question,

This is what I know: during training, the episode ends at the end of the simulation time, tf.

If you have an RL problem, and there are no isdone condition because you just want agent to learn the "optimal" solution to maximize a - reward, but you want the RL to know that the only termination condition is the specific set time, tf. (Tf =5, is fixed and does not change). How do you set the isdone condition. Do you connect a time clock to the isDone or you just leave it unconnected. If it is left unconnected, how does the agent know that that time is the terminating condition? Any recommendation to ensure I am properly training the agent would be appreciated.

Emmanouil Tzorakoleftherakis 2023년 7월 5일

Not very clear why you would want the agent to learn when the termination time of the episode? After training you can always choose to 'unplug' the agent as you see fit.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

답변 (0개)

이 질문에 답변하려면 로그인하십시오.

카테고리

AI and Statistics Deep Learning Toolbox Applications Autonomous and Control Systems Reinforcement Learning

Help Center 및 File Exchange에서 Reinforcement Learning에 대해 자세히 알아보기

제품

릴리스

R2023a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by

Training a DDPG, and observation values are zero. How do I initialize the first episode to have initial values to the action?

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기

답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Training a DDPG, and observation values are zero. How do I initialize the first episode to have initial values to the action?

댓글 수: 4 이전 댓글 2개 표시이전 댓글 2개 숨기기

답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기