Reinforcement Learning Toolbox: Episode Q0 stopped predicting after a few thousand simulations. DQN Agent.
조회 수: 7 (최근 30일)
이전 댓글 표시
Q0 values were pretty ok until episode 2360, it's not stuck, just increasing very very slowly
I'm using the default generated DQN agent (with continuous observations and discrete actions) with only a few modifications. I'm not sure I understand what the issue is here or if this is the correct behaviour and this means my agent has converged to a somewhat stable result.
I understood, from documentation, that Episode Q0 should give a prediction of the "true discounted long-term reward", I assumed this meant the discounted reward for each single episode regardless of the convergence or lack thereof, but maybe I understood something wrong.
Please help clarify. I made several runs and they all display the same behaviour over a few thousand episodes (no always the same amount)
____
The changes I made were only these ones:
critic.Options = rlRepresentationOptions(...
'LearnRate',1e-3,...
'GradientThreshold',1,...
'UseDevice','gpu');
% extract agent options
agentOpts = agent.AgentOptions;
% modify agent options
agentOpts.EpsilonGreedyExploration.EpsilonDecay = 0.005;
agentOpts.DiscountFactor = 0.1;
% resave agent with new options
agent = rlDQNAgent(critic,agentOpts);
![](https://www.mathworks.com/matlabcentral/answers/uploaded_files/642485/image.png)
댓글 수: 2
Emmanouil Tzorakoleftherakis
2021년 6월 9일
Hello,
This behavior is strange, I would create a technical support case so that we can take a closer look if possible.
답변 (0개)
참고 항목
카테고리
Help Center 및 File Exchange에서 Applications에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!