TD3算法训练时动作总是输出边界值

Question

泽宇 2024년 2월 29일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2088376-td3

댓글: 泽宇 2024년 4월 23일

我在使用TD3算法训练完成后，无论训练过程中奖励曲线是否收敛，动作总是输出边界值或者输出完全不正确。我的state的值在0-20000，动作边界在0-15000.是哪里出了问题，是自定义环境创建的不正确还是哪里？需要对输入输出进行归一化吗

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Li 2024년 12월 22일

请问楼主解决了吗，我也是遇到了这个问题，求教

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

UDAYA PEDDIRAJU 2024년 3월 14일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2088376-td3#answer_1425266

Hi 泽宇,

Regarding your issue with the TD3 algorithm where actions always output at boundary values regardless of whether the reward curve converges.

It’s essential to investigate a few potential factors:

Action Bounds: Ensure that the action bounds are correctly defined. If the boundaries are too restrictive, the agent might struggle to learn effective actions.
Normalization: Normalizing the inputs and outputs can significantly impact training stability. Consider normalizing both state and action values to a common range (e.g., [0, 1]).
Custom Environment: Verify that your custom environment is correctly implemented. Double-check the reward function, state representation, and action space.
Exploration Noise: TD3 relies on exploration noise to encourage exploration. Ensure that the noise level is appropriate during training.

you can refer to the documentation TD3: https://www.mathworks.com/help/reinforcement-learning/ug/td3-agents.html.

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

泽宇 2024년 4월 23일

非常感谢您的回答，我的问题到现在依然没有解决，我在用深matlab强化学习工具箱进行自定义环境智能体训练，在第一次训练时（未得到奖励时），智能体给出的action是action约束范围内的值，然而在第二次训练时（得到第一次训练的奖励后），智能体给出的action是action却是约束范围的边界值？并且从第二次训练到后面第n次的训练也是这样，这是为什么？我可以给您我的简易代码，您可以帮忙看一下问题出在哪里了吗？function[Observation,Reward,IsDone,NextState]=newgoushi(Action,State)

E=State;

%% 奖励

GT=1000*Action(1);

NextState=E-GT;

if GT-E<0.1

Reward=0;

else

Reward=-1;

end

IsDone=Reward>=0;

Observation=NextState ;

end

我的action是一个连续的，约束范围在0-12000之间，我的state也是一个连续的，约束范围在5000-10000之间

댓글을 달려면 로그인하십시오.

TD3算法训练时动作总是输出边界值

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

답변 (1개)

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

TD3算法训练时动作总是输出边界值

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

답변 (1개)

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기