Episode Q0 increases exponentially
조회 수: 18 (최근 30일)
이전 댓글 표시
Can anyone explain why episode Q0 in RL increases exponentially after convergence of reward to a suboptimal policy?
댓글 수: 0
답변 (1개)
Emmanouil Tzorakoleftherakis
2021년 2월 16일
Hello,
Please take a look at this answer for some suggestions. Normalizing observations, rewards, and actions can also help avoid situations like these.
Hope this helps
참고 항목
카테고리
Help Center 및 File Exchange에서 Training and Simulation에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!