필터 지우기
필터 지우기

Difference RL Agent training plot and result plot

조회 수: 16 (최근 30일)
sungho park
sungho park 2022년 1월 17일
Hi, below the grahp shows the action during the training and second one shows different action after training. just constant..
can you please help me?
%% Create observation specification
obsInfo = rlNumericSpec([3 1]);
obsInfo.Name = 'observations';
numObs = obsInfo.Dimension(1);
%% Create action specification
actInfo = rlNumericSpec([1 1],'LowerLimit',-50,'UpperLimit',50);
%actInfo = rlNumericSpec([1 1]);
actInfo.Name = 'current';
numActions = actInfo.Dimension(1);
%% Create the environment
blk= [mdl '/RL Agent'];
env = rlSimulinkEnv(mdl,blk,obsInfo,actInfo);
env.ResetFcn= @(in)setVariable(in,'current',5,'Workspace',mdl);
env.UseFastRestart = 'off';
Ts= param.dt;
Tf= param.end_time;
rng(0)
%% Create DDPG Agent
statePath = [
featureInputLayer(numObs,'Normalization','none','Name','observations')
fullyConnectedLayer(200,'Name','CriticStateFC1')
reluLayer('Name', 'CriticRelu1')
fullyConnectedLayer(200,'Name','CriticStateFC2')];
actionPath = [
featureInputLayer(1,'Normalization','none','Name','action')
fullyConnectedLayer(200,'Name','CriticActionFC1','BiasLearnRateFactor',0)];
commonPath = [
additionLayer(2,'Name','add')
reluLayer('Name','CriticCommonRelu')
fullyConnectedLayer(1,'Name','CriticOutput')];
criticNetwork = layerGraph(statePath);
criticNetwork = addLayers(criticNetwork,actionPath);
criticNetwork = addLayers(criticNetwork,commonPath);
criticNetwork = connectLayers(criticNetwork,'CriticStateFC2','add/in1');
criticNetwork = connectLayers(criticNetwork,'CriticActionFC1','add/in2');
figure
plot(criticNetwork)
criticOpts = rlRepresentationOptions('LearnRate',1e-03,'GradientThreshold',1);
%% Create the criticrepresentation using the specified deep neural
% network and options
critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,'Observation', ...
{'observations'},'Action',{'action'},criticOpts);
%% create the actor
actorNetwork = [
featureInputLayer(numObs,'Normalization','none','Name','observations')
fullyConnectedLayer(400,'Name','ctorFC1')
reluLayer('Name','ActorRelu1')
fullyConnectedLayer(300,'Name','ActorFC2')
reluLayer('Name','ActorRelu2')
fullyConnectedLayer(1,'Name','ActorFC3')
tanhLayer('Name','ActorTanh')
scalingLayer('Name','ActorScaling','Scale',max(actInfo.UpperLimit))
];
actorOpts = rlRepresentationOptions('LearnRate',1e-04,'GradientThreshold',1);
actor = rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo, ...
'Observation',{'observations'},'Action',{'ActorScaling'},actorOpts);
%% Create the DDPG agent option
agentOpts = rlDDPGAgentOptions(...
'SampleTime',Ts,...
'TargetSmoothFactor',1e-3,...
'ExperienceBufferLength',1e6,...
'SaveExperienceBufferWithAgent',true,...
'DiscountFactor',0.99,...
"ResetExperienceBufferBeforeTraining",false,...
'MiniBatchSize',256);
agentOpts.NoiseOptions.Variance = 0.2;
agentOpts.NoiseOptions.VarianceDecayRate = 0;
agent = rlDDPGAgent(actor,critic,agentOpts);
%% Train Agent
maxepisodes = 10;
maxsteps = ceil(Tf/Ts);
trainOpts = rlTrainingOptions(...
'MaxEpisodes',maxepisodes,...
'MaxStepsPerEpisode',maxsteps,...
'ScoreAveragingWindowLength',5,...
'Verbose',true,...
'Plots','training-progress',...
'StopTrainingCriteria','AverageReward',...
'StopTrainingValue',5000,...
'SaveAgentCriteria','EpisodeReward',...
'SaveAgentValue',5000);
doTraining = true;
%end
trainingStats = train(agent,env,trainOpts);
  댓글 수: 1
Emmanouil Tzorakoleftherakis
Emmanouil Tzorakoleftherakis 2023년 1월 24일
How long did you train for? If you only trained for 10 episodes, you should give it more time

댓글을 달려면 로그인하십시오.

답변 (0개)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by