Episode Q0 increases exponentially

2021 2월 16

1 답변

조회 수: 4 (30일)

0 개 추천

Can anyone explain why episode Q0 in RL increases exponentially after convergence of reward to a suboptimal policy?

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

Emmanouil Tzorakoleftherakis 2021년 2월 16일

0 개 추천

Hello,

Please take a look at this answer for some suggestions. Normalizing observations, rewards, and actions can also help avoid situations like these.

Hope this helps

DAMODARAN B.K 2021년 2월 17일

편집: DAMODARAN B.K 2021년 2월 17일

is episode Q0, criticnetwork output or target value?

도움말 센터 및 File Exchange에서 Reinforcement Learning에 대해 자세히 알아보기

Find the treasures in MATLAB Central and discover how the community can help you!

Translated by