No convergence during training using an TD3 RL agent

Question

Gaurav 2024년 4월 20일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2109391-no-convergence-during-training-using-an-td3-rl-agent

답변: Ayush Anand 2024년 5월 22일

I am trying to train an agent to navigate a multirotor to a particular 3d coordinate. I am using an TD3 agent with the configuration same as the Train Biped Robot to Walk using Reinforcement Leaning agent ( Link: Train Biped Robot to Walk Using Reinforcement Learning Agents - MATLAB & Simulink (mathworks.com) ). In my case i have 16 observation space and 4 action space. I have normalized both my observation space before passing it to my agent and the action output by the agent is also normalized between -1 to 1 which i later scale it up while passing it to the multirotor environment.

While training the agent rewards drop to zero all the time. I have attached the training results as an image. In the image you can see that the rewards are between 1000 and zero and the rewards keep droping to zero and the agent can't maintain a constant high reward.

Also the agent is trained using parallel computing.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Ayush Anand 2024년 5월 22일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2109391-no-convergence-during-training-using-an-td3-rl-agent#answer_1461461

The reward continuously droppng to zero suggests that the agent might be struggling with either the complexity of the task, the design of the reward function, or issues related to the training setup. Here are a few potential reasons behind the same:

Reward Function being inadequate: If the reward is sparse ,i.e, infrequent feedback to the agent or the reward scale is inappropriate, the agent will fail to learn properly. Ensure that the reward is shaped properly.
Exploration Strategy : As TD3 benefits from a noise-based exploration strategy make sure that the exploration noise is appropriately scaled so that exploration is smooth without causing erratic behavior.
Learning Parameters: You could try experimenting with learning rates for the actor and critic networks, as well as with different batch sizes and replay buffer capacities. You could also try adjusting the discount factor (gamma) and target update frequency.

You can refer to the following links to explore different options with the "rlTD3" agent in MATLAB:

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

No convergence during training using an TD3 RL agent

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

No convergence during training using an TD3 RL agent

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기