Episode Q0 increases exponentially
이전 댓글 표시
Can anyone explain why episode Q0 in RL increases exponentially after convergence of reward to a suboptimal policy?

답변 (1개)
Emmanouil Tzorakoleftherakis
2021년 2월 16일
0 개 추천
Hello,
Please take a look at this answer for some suggestions. Normalizing observations, rewards, and actions can also help avoid situations like these.
Hope this helps
댓글 수: 1
DAMODARAN B.K
2021년 2월 17일
편집: DAMODARAN B.K
2021년 2월 17일
카테고리
도움말 센터 및 File Exchange에서 Reinforcement Learning에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!