필터 지우기
필터 지우기

Q-table issues in the example "Q-learning in the basic grid world"

조회 수: 1 (최근 30일)
Fangyuan Chang
Fangyuan Chang 2020년 11월 9일
댓글: Adi Firdaus 2021년 12월 11일
I trained a Q-learning agent in the matlab predefined environment "BasicGridWorld". I have an issue about the updates of the Q-table. When I set the number of episode to be 1, and set the episode step to be 1, I expect that the new updated Q-value equals to (alpha * R) according to the Bellman equation, where alpha is the learning rate and R is the instant reward. However, the code generates a Q-value different from my expectation. Can anyone help? The code is attached as follows:
rng(0)
env = rlPredefinedEnv("BasicGridWorld");
qTable = rlTable(getObservationInfo(env),getActionInfo(env));
critic = rlQValueRepresentation(qTable,getObservationInfo(env),getActionInfo(env));
critic.Options.LearnRate = 0.1;
critic.Options.L2RegularizationFactor = 0;
critic.Options.Optimizer = "sgdm";
critic.Options.OptimizerParameters.Momentum = 0;
opts = rlQAgentOptions;
opts.EpsilonGreedyExploration.Epsilon = 0.8;
opts.EpsilonGreedyExploration.EpsilonMin = 0.01;
opts.EpsilonGreedyExploration.EpsilonDecay = 0.01;
opts.DiscountFactor = 0.5;
agent = rlQAgent(critic,opts);
trainOpts = rlTrainingOptions(...
'MaxEpisodes',1,...
'MaxStepsPerEpisode',1,...
'StopTrainingCriteria',"AverageReward",...
'StopTrainingValue',30,...
'Verbose',true,...
'Plots','none');
trainOpts.ScoreAveragingWindowLength = 50;
trainingStats = train(agent,env,trainOpts);
trained_critic=getCritic(agent);
trained_table = getLearnableParameters(trained_critic);
trained_qtable=trained_table{1};
% check the updated Q-value
[r,c]=find(trained_table{1,1}~=0);
Q_value = trained_table{1,1}(r,c)
Can anyone help point out my error?
Thank you very much.

답변 (0개)

카테고리

Help CenterFile Exchange에서 Training and Simulation에 대해 자세히 알아보기

제품


릴리스

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by