- Action Bounds: Ensure that the action bounds are correctly defined. If the boundaries are too restrictive, the agent might struggle to learn effective actions.
- Normalization: Normalizing the inputs and outputs can significantly impact training stability. Consider normalizing both state and action values to a common range (e.g., [0, 1]).
- Custom Environment: Verify that your custom environment is correctly implemented. Double-check the reward function, state representation, and action space.
- Exploration Noise: TD3 relies on exploration noise to encourage exploration. Ensure that the noise level is appropriate during training.
TD3算法训练时动作总是输出边界值
조회 수: 28 (최근 30일)
이전 댓글 표시
我在使用TD3算法训练完成后,无论训练过程中奖励曲线是否收敛,动作总是输出边界值或者输出完全不正确。我的state的值在0-20000,动作边界在0-15000.是哪里出了问题,是自定义环境创建的不正确还是哪里?需要对输入输出进行归一化吗
댓글 수: 0
답변 (1개)
UDAYA PEDDIRAJU
2024년 3월 14일
Hi 泽宇,
Regarding your issue with the TD3 algorithm where actions always output at boundary values regardless of whether the reward curve converges.
It’s essential to investigate a few potential factors:
you can refer to the documentation TD3: https://www.mathworks.com/help/reinforcement-learning/ug/td3-agents.html.
참고 항목
카테고리
Help Center 및 File Exchange에서 Big Data Processing에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!