How does the Q-Learning update the qTable by using the reinforcement learning toolbox?

Question

Tracy Shang 2021년 5월 1일

1
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/818725-how-does-the-q-learning-update-the-qtable-by-using-the-reinforcement-learning-toolbox

댓글: Adi Firdaus 2021년 12월 10일

MATLAB Online에서 열기

The 'MaxEpisodes' and "maxStepPerEpisode' are set to 1.

I ran the following code. After the first episode, the Q(4,1) is set to -1.

However, I ran the “train section" and the both Q(4,1) and Q(4,2) are updated, as shown in the following figure.

In the second episode, the action 2 is executed in state 4. Therefore, In my opion, only Q(4,2) should be updated as -1.

Why is Q(4,2) set to 0.7441?

Why is Q(4,1) is updated too and set to -1.67?

clear
GW = createGridWorld(4,4);
GW.CurrentState = '[2,1]';
GW.TerminalStates = '[4,4]';
nS = numel(GW.States);
nA = numel(GW.Actions);
GW.R = -1*ones(nS,nS,nA);
GW.R(:,state2idx(GW,GW.TerminalStates),:) = 10;
env = rlMDPEnv(GW);
qTable = rlTable(getObservationInfo(env),getActionInfo(env));
critic = rlQValueRepresentation(qTable,getObservationInfo(env),getActionInfo(env));
critic.Options.LearnRate =1;
agentOpt = rlQAgentOptions;
agentOpt.EpsilonGreedyExploration.Epsilon = 0.05;
agentOpt.DiscountFactor = 1;
agent = rlQAgent(critic, agentOpt);
plot(env)
env.Model.Viewer.ShowTrace = true;
env.Model.Viewer.clearTrace;
%% train section
rng(0)
opt = rlTrainingOptions(...
    'MaxEpisodes',1,...
    'MaxStepsPerEpisode',1,...
    'StopTrainingCriteria',"AverageReward",...
    'Plots', "none",...
    'StopTrainingValue',480);
trainStats = train(agent,env,opt);
%%
aa = getLearnableParameters(getCritic(agent));

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Emmanouil Tzorakoleftherakis 2021년 5월 3일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/818725-how-does-the-q-learning-update-the-qtable-by-using-the-reinforcement-learning-toolbox#answer_691070

MATLAB Online에서 열기

Can you try

critic.Options.L2RegularizationFactor=0;

This parameter is nonzero by default and likely the reason for the discrepancy you are observing

댓글 수: 2
없음 표시없음 숨기기

Tracy Shang 2021년 5월 4일

편집: Tracy Shang 2021년 5월 4일

MATLAB Online에서 열기

Thanks for your answer!

I tried the code you suggested. The resut showed no difference.

But you inspired me!

I tried another parameter just like as follows. The qTable was updated as shown in the following figure.

critic.Options.OptimizerParameters.GradientDecayFactor =0;

I tried both parameters by add the following codes and the qTable was updated as shown in the following figure. At least, the question about Q(4,1) is solved.

According the parameters I set, the equtation of calculating Qvalue is simplified as follows.

That is,

.

Why is Q(4,2) set to -1.4139?

critic.Options.OptimizerParameters.GradientDecayFactor =0;  
critic.Options.L2RegularizationFactor=0;

Looking forward to your further answer. Thank you very much!

Adi Firdaus 2021년 12월 10일

need answer too

댓글을 달려면 로그인하십시오.

How does the Q-Learning update the qTable by using the reinforcement learning toolbox?

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 2
없음 표시없음 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

How does the Q-Learning update the qTable by using the reinforcement learning toolbox?

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 2 없음 표시없음 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 2
없음 표시없음 숨기기