Episode Q0 increases exponentially

Can anyone explain why episode Q0 in RL increases exponentially after convergence of reward to a suboptimal policy?

답변 (1개)

Emmanouil Tzorakoleftherakis
Emmanouil Tzorakoleftherakis 2021년 2월 16일

0 개 추천

Hello,
Please take a look at this answer for some suggestions. Normalizing observations, rewards, and actions can also help avoid situations like these.
Hope this helps

댓글 수: 1

DAMODARAN B.K
DAMODARAN B.K 2021년 2월 17일
편집: DAMODARAN B.K 2021년 2월 17일
is episode Q0, criticnetwork output or target value?

댓글을 달려면 로그인하십시오.

질문:

2021년 2월 16일

편집:

2021년 2월 17일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by