RLToolboxのGridWorldについて

Question

shoki kobayashi 2020년 6월 14일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/548043-rltoolbox-gridworld

댓글: shoki kobayashi 2020년 7월 28일

GridWorldをQ学習で解くのに困っています。

自分はhttps://jp.mathworks.com/help/reinforcement-learning/ug/train-q-learning-agent-to-solve-basic-grid-world.htmlを参考にしながら

GridWorldを解くプログラミングを作ったのですが、Agentが上手に学習してくれないです

どのように改善すればよろしいでしょうか

%迷路の作成
GW = createGridWorld(8,8);
GW.CurrentState = '[2,1]';
GW.TerminalStates = '[8,8]';
GW.ObstacleStates = ["[3,3]";"[3,4]";"[3,5]";"[3,6]";"[3,7]";"[4,3]";"[7,3]";"[6,3]";"[5,3]"];
updateStateTranstionForObstacles(GW)
GW.T(state2idx(GW,"[2,4]"),:,:) = 0;
GW.T(state2idx(GW,"[2,4]"),state2idx(GW,"[4,4]"),:) = 1;
nS = numel(GW.States);
nA = numel(GW.Actions);
GW.R = -1*ones(nS,nS,nA);
GW.R(state2idx(GW,"[4,2]"),state2idx(GW,"[5,2]"),:) = 5;
GW.R(state2idx(GW,"[8,3]"),state2idx(GW,"[8,4]"),:) = 5;
GW.R(:,state2idx(GW,GW.TerminalStates),:) = 10;
%環境の読み込み
env = rlMDPEnv(GW)
env.ResetFcn = @() 2;
rng(0)
%Q学習
qTable = rlTable(getObservationInfo(env),getActionInfo(env));
qRepresentation = rlQValueRepresentation(qTable,getObservationInfo(env),getActionInfo(env));
qRepresentation.Options.LearnRate = 1;
agentOpts = rlQAgentOptions;
agentOpts.EpsilonGreedyExploration.Epsilon = .04;
qAgent = rlQAgent(qRepresentation,agentOpts);
trainOpts = rlTrainingOptions;
trainOpts.MaxStepsPerEpisode = 50;
trainOpts.MaxEpisodes= 200;
trainOpts.StopTrainingCriteria = "AverageReward";
trainOpts.StopTrainingValue = 101;
trainOpts.ScoreAveragingWindowLength = 30;
doTraining = false;
if doTraining
    % Train the agent.
    trainingStats = train(qAgent,env,trainOpts);
end
%結果の描画
plot(env)
env.Model.Viewer.ShowTrace = true;
env.Model.Viewer.clearTrace;
sim(qAgent,env)

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Kazuaki Yamada 2020년 7월 28일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/548043-rltoolbox-gridworld#answer_471739

次の通り変更すると学習しました．

12-13行目をコメントアウト

32行目のfalseをtrueに変更

%迷路の作成

GW = createGridWorld(8,8);

GW.CurrentState = '[2,1]';

GW.TerminalStates = '[8,8]';

GW.ObstacleStates = ["[3,3]";"[3,4]";"[3,5]";"[3,6]";"[3,7]";"[4,3]";"[7,3]";"[6,3]";"[5,3]"];

updateStateTranstionForObstacles(GW)

GW.T(state2idx(GW,"[2,4]"),:,:) = 0;

GW.T(state2idx(GW,"[2,4]"),state2idx(GW,"[4,4]"),:) = 1;

nS = numel(GW.States);

nA = numel(GW.Actions);

GW.R = -1*ones(nS,nS,nA);

%GW.R(state2idx(GW,"[4,2]"),state2idx(GW,"[5,2]"),:) = 5; %--- ?

%GW.R(state2idx(GW,"[8,3]"),state2idx(GW,"[8,4]"),:) = 5; %--- ?

GW.R(:,state2idx(GW,GW.TerminalStates),:) = 10;

%環境の読み込み

env = rlMDPEnv(GW)

env.ResetFcn = @() 2;

rng(0)

%Q学習

qTable = rlTable(getObservationInfo(env),getActionInfo(env));

qRepresentation = rlQValueRepresentation(qTable,getObservationInfo(env),getActionInfo(env));

qRepresentation.Options.LearnRate = 1;

agentOpts = rlQAgentOptions;

agentOpts.EpsilonGreedyExploration.Epsilon = .04;

qAgent = rlQAgent(qRepresentation,agentOpts);

trainOpts = rlTrainingOptions;

trainOpts.MaxStepsPerEpisode = 50;

trainOpts.MaxEpisodes= 200;

trainOpts.StopTrainingCriteria = "AverageReward";

trainOpts.StopTrainingValue = 101;

trainOpts.ScoreAveragingWindowLength = 30;

doTraining = true; %--- trueにしないと以下のif文に入らない

if doTraining

% Train the agent.

trainingStats = train(qAgent,env,trainOpts);

end

%結果の描画

plot(env)

env.Model.Viewer.ShowTrace = true;

env.Model.Viewer.clearTrace;

sim(qAgent,env)

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

shoki kobayashi 2020년 7월 28일

実行してみると上手くいきました！

doTraining = true;

にしないとif文が実行されないんですね。。。

ありがとうございます！

댓글을 달려면 로그인하십시오.

RLToolboxのGridWorldについて

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

Community Treasure Hunt

RLToolboxのGridWorldについて

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기