TD3算法训练时动作总是输出边界值
조회 수: 16 (최근 30일)
이전 댓글 표시
답변 (1개)
UDAYA PEDDIRAJU
2024년 3월 14일
Hi 泽宇,
Regarding your issue with the TD3 algorithm where actions always output at boundary values regardless of whether the reward curve converges.
It’s essential to investigate a few potential factors:
- Action Bounds: Ensure that the action bounds are correctly defined. If the boundaries are too restrictive, the agent might struggle to learn effective actions.
- Normalization: Normalizing the inputs and outputs can significantly impact training stability. Consider normalizing both state and action values to a common range (e.g., [0, 1]).
- Custom Environment: Verify that your custom environment is correctly implemented. Double-check the reward function, state representation, and action space.
- Exploration Noise: TD3 relies on exploration noise to encourage exploration. Ensure that the noise level is appropriate during training.
you can refer to the documentation TD3: https://www.mathworks.com/help/reinforcement-learning/ug/td3-agents.html.
참고 항목
카테고리
Help Center 및 File Exchange에서 Environments에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!