필터 지우기
필터 지우기

Implementation of Proximal Policy Optimisation

조회 수: 5 (최근 30일)
shoki kobayashi
shoki kobayashi 2020년 9월 11일
댓글: Kashish Dhal 2021년 10월 12일
I am currently trying to control the simlink homebrew environment using PPOAgent.
However, the following error occurs, and the problem continues to be unsuccessful.
How should we improve the situation?
Error: rl.representation.rlStochasticActorRepresentation (line 32)
Number of outputs for a continuous stochastic actor representation must be two times the number of actions.
Error: rlStochasticActorRepresentation (line 139)
Rep = rl.representation.rlStochasticActorRepresentation(...
my code
clear all
motion_time_constant = 0.01;
mdl = 'fivelinkrl';
open_system(mdl)
Ts = 0.05;
Tf = 20;
mdl = 'fivelinkrl';
open_system(mdl)
agentblk = [mdl '/RL Agent'];
numObs = 15;
obsInfo = rlNumericSpec([numObs 1]);
obsInfo.Name = 'observations';
numAct = 5;
actInfo = rlNumericSpec([numAct 1],'LowerLimit',-10,'UpperLimit',10);
actInfo.Name = 'Action';
% define environment
env = rlSimulinkEnv(mdl,agentblk,obsInfo,actInfo);
%createPPOAgent
criticLayerSizes = [400 300];
actorLayerSizes = [400 300];
createNetworkWeights;
criticNetwork = [imageInputLayer([numObs 1 1],'Normalization','none','Name','observations')
fullyConnectedLayer(criticLayerSizes(1),'Name','CriticFC1', ...
'Weights',weights.criticFC1, ...
'Bias',bias.criticFC1)
reluLayer('Name','CriticRelu1')
fullyConnectedLayer(criticLayerSizes(2),'Name','CriticFC2', ...
'Weights',weights.criticFC2, ...
'Bias',bias.criticFC2)
reluLayer('Name','CriticRelu2')
fullyConnectedLayer(1,'Name','CriticOutput',...
'Weights',weights.criticOut,...
'Bias',bias.criticOut)];
criticOpts = rlRepresentationOptions('LearnRate',1e-3);
critic = rlValueRepresentation(criticNetwork,env.getObservationInfo, ...
'Observation',{'observations'},criticOpts);
actorNetwork = [imageInputLayer([numObs 1 1],'Normalization','none','Name','observations')
fullyConnectedLayer(actorLayerSizes(1),'Name','ActorFC1',...
'Weights',weights.actorFC1,...
'Bias',bias.actorFC1)
reluLayer('Name','ActorRelu1')
fullyConnectedLayer(actorLayerSizes(2),'Name','ActorFC2',...
'Weights',weights.actorFC2,...
'Bias',bias.actorFC2)
reluLayer('Name','ActorRelu2')
fullyConnectedLayer(numAct,'Name','Action',...
'Weights',weights.actorOut,...
'Bias',bias.actorOut)
softmaxLayer('Name','actionProbability')
];
actorOptions = rlRepresentationOptions('LearnRate',1e-3);
%%%% ↓error %%%%%%%%%%%%%%%%%
actor = rlStochasticActorRepresentation(actorNetwork,obsInfo,actInfo,...
'Observation',{'observations'}, actorOptions);
%%%% ↑error %%%%%%%%%%%%%%%%%%
opt = rlPPOAgentOptions('ExperienceHorizon',512,...
'ClipFactor',0.2,...
'EntropyLossWeight',0.02,...
'MiniBatchSize',64,...
'NumEpoch',3,...
'AdvantageEstimateMethod','gae',...
'GAEFactor',0.95,...
'SampleTime',0.05,...
'DiscountFactor',0.9995);
agent = rlPPOAgent(actor,critic,opt);
%TrainAgent
maxEpisodes = 4000;
maxSteps = floor(Tf/Ts);
trainOpts = rlTrainingOptions(...
'MaxEpisodes',maxEpisodes,...
'MaxStepsPerEpisode',maxSteps,...
'ScoreAveragingWindowLength',250,...
'Verbose',false,...
'Plots','training-progress',...
'StopTrainingCriteria','EpisodeCount',...
'StopTrainingValue',maxEpisodes,...
'SaveAgentCriteria','EpisodeCount',...
'SaveAgentValue',maxEpisodes);
trainingStats = train(agent,env,trainOpts);
save('agent.mat', 'agent')
Result in simulation
simOptions = rlSimulationOptions('MaxSteps',maxSteps);
experience = sim(env,agent,simOptions);
  댓글 수: 1
Kashish Dhal
Kashish Dhal 2021년 10월 12일
Can you please update the correct code for the actor Network in the post, I am getting the same error and unable to follow through the comments?

댓글을 달려면 로그인하십시오.

채택된 답변

Emmanouil Tzorakoleftherakis
Emmanouil Tzorakoleftherakis 2020년 9월 15일
Hello,
It seems you want to use PPO with continuous action space. If that's the case, your actor network does not have the right architecture. With stochastic agents, the neural network should end with a path that outputs 'mean' value and another path that outputs 'variance'. In your case you seem to only have a single path. Please refer to this example here to get an idea on how to set up your actor network. Also make sure you are using 20a (PPO for continuous actions was not available in previous releases as far as I remember).
Hope that helps
  댓글 수: 1
shoki kobayashi
shoki kobayashi 2020년 9월 24일
I was able to operate successfully.
Thank you very much.

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by