Reinforcement learning action getting saturated at one range of values

조회 수: 9 (최근 30일)
Hi all,
I have a reinforcement learning env with 4 observations and 6 actions. Each action has a lower limit of 0.05 and an upper limit of 1. I see that the actions during training are getting saturated at one band of values.
Example: The action limits that has been specified is 0.05 to 1. But I see that the action output during training varies in the range of 0 to 0.16 only and does not go out of that band
I have attached a capture of the action output during training.
Attaching the code below
clc;
clear;
close;
%Load the parameters for the simulink
SPWM_RL_Data;
%Open Simulink Model
mdl = "RL_Debug";
open_system(mdl);
%Create Environment Interface
open_system('RL_Debug/Firing Unit');
%Create Observation specifications
numObservations = 4;
observationInfo = rlNumericSpec([numObservations 1]);
observationInfo.Name = 'observations';
observationInfo.Description = 'Error signals';
%Create Action Specifications
numActions = 6;
actionInfo = rlNumericSpec([numActions 1],'LowerLimit',[0.05;0.05;0.05;0.05;0.05;0.05],'UpperLimit',[1;1;1;1;1;1]);
actionInfo.Name = 'switchingPulses';
%Create Simulink environment for observation and action specifications
agentblk = 'RL_Debug/Firing Unit/RL Agent';
env = rlSimulinkEnv(mdl,agentblk,observationInfo,actionInfo);
%Get observation and action info from the environment
% obtain observation and action specifications
obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);
rng(0) % fix the random seed
statePath = [featureInputLayer(numObservations,'Normalization','none','Name','State')
fullyConnectedLayer(64,'Name','fc1')];
actionPath = [featureInputLayer(numActions, 'Normalization', 'none', 'Name','Action')
fullyConnectedLayer(64, 'Name','fc2')];
commonPath = [additionLayer(2,'Name','add')
reluLayer('Name','relu2')
fullyConnectedLayer(32, 'Name','fc3')
reluLayer('Name','relu3')
fullyConnectedLayer(16, 'Name','fc4')
fullyConnectedLayer(1, 'Name','CriticOutput')];
criticNetwork = layerGraph();
criticNetwork = addLayers(criticNetwork,statePath);
criticNetwork = addLayers(criticNetwork,actionPath);
criticNetwork = addLayers(criticNetwork,commonPath);
criticNetwork = connectLayers(criticNetwork,'fc1','add/in1');
criticNetwork = connectLayers(criticNetwork,'fc2','add/in2');
%Create a representation of the critic using recurrent neural network
criticOptions = rlRepresentationOptions('LearnRate',1e-4,'GradientThreshold',1);
critic1 = rlQValueRepresentation(criticNetwork,observationInfo,actionInfo,...
'Observation',{'State'},'Action',{'Action'},criticOptions);
critic2 = rlQValueRepresentation(criticNetwork,observationInfo,actionInfo,...
'Observation',{'State'},'Action',{'Action'},criticOptions);
actorNetwork = [featureInputLayer(numObservations,'Normalization','none','Name','State')
fullyConnectedLayer(64, 'Name','actorFC1')
reluLayer('Name','relu1')
fullyConnectedLayer(32, 'Name','actorFC2')
reluLayer('Name','relu2')
fullyConnectedLayer(numActions,'Name','Action')
tanhLayer('Name','tanh1')
scalingLayer('Name','scale','Scale',actionInfo.UpperLimit)];
actorOptions = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1,'L2RegularizationFactor',0.001);
actor = rlDeterministicActorRepresentation(actorNetwork,observationInfo,actionInfo,...
'Observation',{'State'},'Action',{'scale'},actorOptions);
%Ts_agent = Ts;
agentOptions = rlTD3AgentOptions("SampleTime",Ts_agent, ...
"DiscountFactor", 0.995, ...
"ExperienceBufferLength",2e6, ...
"MiniBatchSize",512, ...
"NumStepsToLookAhead",5, ...
"TargetSmoothFactor",0.005, ...
"TargetUpdateFrequency",2);
agentOptions.ExplorationModel.Variance = 0.05;
agentOptions.ExplorationModel.VarianceDecayRate = 2e-4;
agentOptions.ExplorationModel.VarianceMin = 0.001;
agentOptions.TargetPolicySmoothModel.Variance = 0.1;
agentOptions.TargetPolicySmoothModel.VarianceDecayRate = 1e-4;
agent = rlTD3Agent(actor,[critic1,critic2],agentOptions);
%T = 1.0;
maxepisodes = 10000;
maxsteps = ceil(Tf/Ts_agent);
trainingOpts = rlTrainingOptions(...
'MaxEpisodes',maxepisodes, ...
'MaxStepsPerEpisode',maxsteps, ...
'StopTrainingCriteria','AverageReward',...
'StopTrainingValue',8000,...
'ScoreAveragingWindowLength',100);
if(doTraining)
trainStats = train(agent,env,trainingOpts);
save("Agent.mat","agent")
else
load("Agent.mat")
end
%Simulate the Agent
rng(0);
simOptions = rlSimulationOptions('MaxSteps', maxsteps, 'NumSimulations', 1);
sim(env,agent,simOptions);

채택된 답변

Emmanouil Tzorakoleftherakis
Emmanouil Tzorakoleftherakis 2021년 4월 15일
편집: Emmanouil Tzorakoleftherakis 2023년 6월 20일
Your scaling layer is not set up correctly. You want to scale to (upper limit-lower limit)/2 and then shift accordingly.
scalingLayer('Scale',(actionInfo.UpperLimit-actionInfo.LowerLimit)/2,'Bias',(actionInfo.UpperLimit-actionInfo.LowerLimit)/2)

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Reinforcement Learning에 대해 자세히 알아보기

제품


릴리스

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by