2 out of 7 Observations Defined in MATLAB DDPG Reinforcement Learning Environment. Are the rest given random values?

Question

Huzaifah Shamim 2020년 6월 25일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/554911-2-out-of-7-observations-defined-in-matlab-ddpg-reinforcement-learning-environment-are-the-rest-give

답변: Emmanouil Tzorakoleftherakis 2020년 7월 2일

After reading up one Deep Deterministic Policy Gradient, I found this example on MATLAB:

https://www.mathworks.com/help/reinforcement-learning/ug/train-agent-to-control-flying-robot.html#TrainDDPGAgentToControlFlyingRobotExample-4

My question is the following: In DDPG, we plug in the Observation to our Actor to get our actions. The observations in the MATLAB environment are 7: x, y, dx, dy, sin, cos, dtheta. However, only x and y are assigned in the beginning. Does that mean that the rest are given random values before placed in the Critic Network? If my understanding is wrong, could someone please explain to me what is occurring in this model? Thank You

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Emmanouil Tzorakoleftherakis 2020년 7월 2일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/554911-2-out-of-7-observations-defined-in-matlab-ddpg-reinforcement-learning-environment-are-the-rest-give#answer_460048

Hello,

I am assuming you are referring to the initialization of x and y inside the "flyingRobotResetFcn" function. Basically, if you are using a Simulink model as your environment (like in this case), there is no need to initialize any of the observations in your problem. The initial conditions are directly decided by values in your Simulink blocks. However, it is good practive to try and change the initial conditions of every episode so that the agent gets exposed to different scenarios. Reinforcement Learning Toolbox lets you do that using the reset function mechanism. So what is happening here is that we are using the reset function to change x0 and y0 and let the remaining observations to the values determined in the Simulink model.

Hope that helps.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

2 out of 7 Observations Defined in MATLAB DDPG Reinforcement Learning Environment. Are the rest given random values?

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

2 out of 7 Observations Defined in MATLAB DDPG Reinforcement Learning Environment. Are the rest given random values?

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기