TD3 agent fails to explore again after hitting the max action and gets stuck at the max action value. Additionally, the Q0 value exploded to large value.

조회 수: 14 (최근 30일)

Bay Jay 2024년 6월 17일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2129201-td3-agent-fails-to-explore-again-after-hitting-the-max-action-and-gets-stuck-at-the-max-action-value

댓글: Bay Jay 2024년 6월 19일

The range of the a single action = 0.01 to 5. During learning using TD3, the learning is consist. However, if the agent applies the maximum values, it get stuck fails to explores lower values and suddenly does not improve or deteriorate further. I am not sure what could be the reason. The Q0 value explodes at this point.

at this point.

댓글 수: 2
없음 표시없음 숨기기

surya venu 2024년 6월 17일

Hi,

The situation you're describing with your TD3 agent is indicative of a few potential issues in continuous action spaces.

Enhance Exploration:

Adjust the exploration noise scale to ensure the agent explores actions across the entire range, not just at the max value.

Refine Reward Function:

Ensure the reward function doesn't bias the agent towards always picking the maximum action value by providing incentives for exploring different actions.
TD3 uses reward clipping to prevent the agent from learning from unrealistically large rewards. If the reward clipping is set too high, it could prevent the agent from learning from the negative consequences of taking the maximum action. This could lead to the agent getting stuck at the maximum action value.pen_spark

Address Q-Value Explosion:

Implement gradient clipping to prevent large updates that can lead to value explosion.
Ensure the target network update rate is set to maintain training stability.

Regularization and Normalization:

Consider using batch normalization and weight regularization to stabilize the learning process.

Hope it helps.

Bay Jay 2024년 6월 19일

@surya venu I have increased the noise standard deviation for early learning/exploration of the max action. Monitoring for the performance. Thanks for the suggestions. Appreciated.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

답변 (0개)

이 질문에 답변하려면 로그인하십시오.

카테고리

AI and Statistics Deep Learning Toolbox Applications Autonomous and Control Systems Reinforcement Learning

Help Center 및 File Exchange에서 Reinforcement Learning에 대해 자세히 알아보기

제품

릴리스

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by

TD3 agent fails to explore again after hitting the max action and gets stuck at the max action value. Additionally, the Q0 value exploded to large value.

댓글 수: 2
없음 표시없음 숨기기

답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

TD3 agent fails to explore again after hitting the max action and gets stuck at the max action value. Additionally, the Q0 value exploded to large value.

댓글 수: 2 없음 표시없음 숨기기

답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 2
없음 표시없음 숨기기