필터 지우기
필터 지우기

PPOAgentの実装について

조회 수: 2 (최근 30일)
shoki kobayashi
shoki kobayashi 2020년 9월 11일
댓글: shoki kobayashi 2020년 9월 13일
現在、simlinkの自作環境をPPOAgentを用いて制御しようとしています。
しかし、下のようなエラーが発生し上手くいかない状況が続いています。
どのように改善すればよろしいでしょうか。
エラー: rl.representation.rlStochasticActorRepresentation (line 32)
Number of outputs for a continuous stochastic actor representation must be two times the number of actions.
エラー: rlStochasticActorRepresentation (line 139)
Rep = rl.representation.rlStochasticActorRepresentation(...
自分のコード
clear all
motion_time_constant = 0.01;
mdl = 'fivelinkrl';
open_system(mdl)
Ts = 0.05;
Tf = 20;
mdl = 'fivelinkrl';
open_system(mdl)
agentblk = [mdl '/RL Agent'];
numObs = 15;
obsInfo = rlNumericSpec([numObs 1]);
obsInfo.Name = 'observations';
numAct = 5;
actInfo = rlNumericSpec([numAct 1],'LowerLimit',-10,'UpperLimit',10);
actInfo.Name = 'Action';
% define environment
env = rlSimulinkEnv(mdl,agentblk,obsInfo,actInfo);
%createPPOAgent
criticLayerSizes = [400 300];
actorLayerSizes = [400 300];
createNetworkWeights;
criticNetwork = [imageInputLayer([numObs 1 1],'Normalization','none','Name','observations')
fullyConnectedLayer(criticLayerSizes(1),'Name','CriticFC1', ...
'Weights',weights.criticFC1, ...
'Bias',bias.criticFC1)
reluLayer('Name','CriticRelu1')
fullyConnectedLayer(criticLayerSizes(2),'Name','CriticFC2', ...
'Weights',weights.criticFC2, ...
'Bias',bias.criticFC2)
reluLayer('Name','CriticRelu2')
fullyConnectedLayer(1,'Name','CriticOutput',...
'Weights',weights.criticOut,...
'Bias',bias.criticOut)];
criticOpts = rlRepresentationOptions('LearnRate',1e-3);
critic = rlValueRepresentation(criticNetwork,env.getObservationInfo, ...
'Observation',{'observations'},criticOpts);
actorNetwork = [imageInputLayer([numObs 1 1],'Normalization','none','Name','observations')
fullyConnectedLayer(actorLayerSizes(1),'Name','ActorFC1',...
'Weights',weights.actorFC1,...
'Bias',bias.actorFC1)
reluLayer('Name','ActorRelu1')
fullyConnectedLayer(actorLayerSizes(2),'Name','ActorFC2',...
'Weights',weights.actorFC2,...
'Bias',bias.actorFC2)
reluLayer('Name','ActorRelu2')
fullyConnectedLayer(numAct,'Name','Action',...
'Weights',weights.actorOut,...
'Bias',bias.actorOut)
softmaxLayer('Name','actionProbability')
];
actorOptions = rlRepresentationOptions('LearnRate',1e-3);
%%%% ↓error %%%%%%%%%%%%%%%%%
actor = rlStochasticActorRepresentation(actorNetwork,obsInfo,actInfo,...
'Observation',{'observations'}, actorOptions);
%%%% ↑error %%%%%%%%%%%%%%%%%%
opt = rlPPOAgentOptions('ExperienceHorizon',512,...
'ClipFactor',0.2,...
'EntropyLossWeight',0.02,...
'MiniBatchSize',64,...
'NumEpoch',3,...
'AdvantageEstimateMethod','gae',...
'GAEFactor',0.95,...
'SampleTime',0.05,...
'DiscountFactor',0.9995);
agent = rlPPOAgent(actor,critic,opt);
%TrainAgent
maxEpisodes = 4000;
maxSteps = floor(Tf/Ts);
trainOpts = rlTrainingOptions(...
'MaxEpisodes',maxEpisodes,...
'MaxStepsPerEpisode',maxSteps,...
'ScoreAveragingWindowLength',250,...
'Verbose',false,...
'Plots','training-progress',...
'StopTrainingCriteria','EpisodeCount',...
'StopTrainingValue',maxEpisodes,...
'SaveAgentCriteria','EpisodeCount',...
'SaveAgentValue',maxEpisodes);
trainingStats = train(agent,env,trainOpts);
save('agent.mat', 'agent')
Result in simulation
simOptions = rlSimulationOptions('MaxSteps',maxSteps);
experience = sim(env,agent,simOptions);

답변 (1개)

Toshinobu Shintai
Toshinobu Shintai 2020년 9월 11일
PPOエージェントのactionは離散でなければなりませんので、actionInfoは、例えば以下のように定義します。
actInfo = rlFiniteSetSpec({[-1, -1, -1], [1, 1, 1]});
上記の場合は、numActは2となります。numActにはアクションのパターン数を入力します。
制御器の出力としての次元数は、上記の[-1, -1, -1]のベクトルの次元数として指定します。このとき、PPOエージェントは [-1, -1, -1] か [1, 1, 1] を、その時のobservationに応じて選択して出力します。
修正したコードを添付しましたのでご確認ください。
  댓글 수: 1
shoki kobayashi
shoki kobayashi 2020년 9월 13일
返信ありがとうございます。
添付したコードを用いると上手くいきました。
本当に助かりました。
1つ質問なのですが、PPOエージェントのactionを連続系で用いる方法はありますか?

댓글을 달려면 로그인하십시오.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!