my DDPG agent starts applying one single action
조회 수: 1 (최근 30일)
이전 댓글 표시
Hello, i am new i Deep Reinforcement learning
using RL Toolbox i am trying to train a DDPG agent to go to a position and stay there (start position = 0 , target position = 5), if he goes above 5 or under 0 he will get a big penalty. the agent starts learning and trying different actions for the first 20~30 episodes and then starts to implement the extreme action (+1) (action space[-1 1]) for the next 100 episodes, it is like he found the optimal action to take each step, which is weird because if he keeps applying the action (+1) he gets to the penalty quickly which doesn't make any sense. even if i let it for +1000 Episodes he comes back to the action (+1) everytime. my reward function for now is:
(-0,1*(reference position - actual position)^2) - 100 *( if X <0 or X>5)
댓글 수: 1
nick
2023년 11월 16일
Hi Mokhtar,
Kindly specify the environment of the agent. Also what is meant by reference position? Are the start and stop position refering to X coordinates? It would be better if you can share the code.
답변 (0개)
참고 항목
카테고리
Help Center 및 File Exchange에서 Deep Learning Toolbox에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!