Could you help clarify the terminology and usage of Exploratory Policy and Exploratory Model in TD3 Reinforcement Learning

Question

Bay Jay 2023년 11월 27일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2052772-could-you-help-clarify-the-terminology-and-usage-of-exploratory-policy-and-exploratory-model-in-td3

댓글: Bay Jay 2024년 1월 22일

TD3 agent has the exploratory model that we set for noise parameters. By default example PMSM Control, the UseExploratorypolicy is set = 0.

Also during policy generation after the training, exploratorypolicy has to be set to 0. What is the right procedure during training. Is the exploratory policy supposed to be =1 or 0 during training and what is the effect on the exploratory model (noise) when exploratory policy is set to 0. Thanks.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Emmanouil Tzorakoleftherakis 2023년 12월 21일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2052772-could-you-help-clarify-the-terminology-and-usage-of-exploratory-policy-and-exploratory-model-in-td3#answer_1375752

The answer above is correct with a small caveat: Even if UseExploratorypolicy is set = 0, the agent will still explore during training (we are taking care of it under the hood). After training it returns to the original value/the value that you set. Essentially this parameter only affects what happens when you run simulations (after training), or when you manually call 'getAction'.

Hope this helps

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Bay Jay 2024년 1월 22일

Thank you

댓글을 달려면 로그인하십시오.

Answer 2

Venu 2023년 12월 12일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2052772-could-you-help-clarify-the-terminology-and-usage-of-exploratory-policy-and-exploratory-model-in-td3#answer_1370219

편집: Venu 2023년 12월 12일

Hi @Bay Jay,

The correct procedure during training for the TD3 agent is to set the exploratory policy to 1.

When the exploratory policy is set to 1, it enables the agent to use the base agent exploration policy, which incorporates the exploratory model for noise parameters. This enables the agent to explore its action and observation spaces by introducing "stochastic" action selection, thus encouraging exploration during training.

When the exploratory policy is set to 0, it forces the agent to use the base agent greedy policy, resulting in "deterministic" action selection. In this case, the exploratory model (noise) will not influence the agent's actions during training, as the agent behaves deterministically selecting actions with maximum likelihood, and does not explore its action and observation spaces during deployment.

In the "rlTD3agent" documentation below, the "generatePolicyFunction" block is a function used to create a policy function for deployment.

https://www.mathworks.com/help/reinforcement-learning/ref/rl.agent.rltd3agent.html https://www.mathworks.com/help/reinforcement-learning/ref/rl.policy.rlmaxqpolicy.generatepolicyfunction.html

Setting the "UseExplorationPolicy" property to true during training ensures agent behaves stochastically, while setting up the policy for deployment, setting the property to false ensures the agent behaves deterministically,

Hope this helps!

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Bay Jay 2024년 1월 22일

Thank you

댓글을 달려면 로그인하십시오.

Could you help clarify the terminology and usage of Exploratory Policy and Exploratory Model in TD3 Reinforcement Learning

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (1개)

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Could you help clarify the terminology and usage of Exploratory Policy and Exploratory Model in TD3 Reinforcement Learning

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (1개)

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기