Is my DDPG agent learning?

Question

Bryan 2024년 5월 2일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2114126-is-my-ddpg-agent-learning

답변: Yatharth 2024년 5월 20일

Hello everyone,

Can I conclude that my agent is learning? (maximum reward per episode is 20).

In the first image, the reward was low (-5) for the first episode, and it can be observed that the average reward starts to increase from episode 80. However, it fluctuates between 5 and 20 after episode 100. Reading other questions, it was mentioned that Q0 could help determine if the agent is learning, and as it approaches the maximum reward, I think it could be determined that it is learning. However, what makes me doubt the learning are the fluctuations in rewards after episode 100.

Another thing that makes me doubt if the agent is learning is that, while conducting another training session (image 2), the fluctuations in the average rewards are more noticeable. Even though Q0 still tends towards the maximum reward (20), in both training sessions, they continue to receive negative rewards (more than expected).

So it's difficult for me to determine if the agent is learning. If that's not the case, what should I modify? The reward? The agent's hyperparameters?

I would greatly appreciate your guidance.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Yatharth 2024년 5월 20일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2114126-is-my-ddpg-agent-learning#answer_1460151

Hi Bryan,

It is very hard to pinpoint the exact reason for the sudden drops in episodic reward (sum of rewards at each step) without knowing anything about the environment or reward function. RL training is stochastic, so it is likely that the agent may be entering states in certain episodes that cause early termination or large penalties. This can obviously have a large impact on the cumulative reward. A suggestion would be to run a short training, save the agent information, and investigate whether the reward function is being evaluated correctly by the agents.

Fluctuations can be a sign that the agent is still exploring the environment. It's essential to balance exploration with exploitation. You might need to adjust parameters related to exploration.

Refer to the following documentation which provides further details about the training algorithm. https://www.mathworks.com/help/reinforcement-learning/ug/ddpg-agents.html

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Is my DDPG agent learning?

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Is my DDPG agent learning?

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기