DDPG multiple action noise variance error

조회 수: 12 (최근 30일)
Tech Logg Ding
Tech Logg Ding 2020년 11월 6일
댓글: 勇刚 张 2022년 3월 30일
Hi,
I am working on developing an adaptive PID for a water tank level controller shown here:
The outputs of the RL Agent block are the 3 controller gains. As the 3 gains have very different range of values, I thought it was a good idea to use different variance for every action as suggested in the rlDDPGAgentOptions page.
However, when I initiate training, I get the following error:
Caused by:
Error using rl.env.SimulinkEnvWithAgent>localHandleSimoutErrors (line 681)
For 'Output Port 1' of 'rlwatertankAdaptivePID/RL Agent/AgentWrapper', the 'outputImpl' method of the System object
'rl.simulink.blocks.AgentWrapper' returned a value whose size [3x3], does not match the value returned by the 'getOutputSizeImpl' method. Either
change the size of the value returned by 'outputImpl', or change the size returned by 'getOutputSizeImpl'.
I defined the agent options as follow:
%% Specify DDPG agent options
agentOptions = rlDDPGAgentOptions;
agentOptions.SampleTime = Ts;
agentOptions.DiscountFactor = 0.9;
agentOptions.MiniBatchSize = 128;
agentOptions.ExperienceBufferLength = 1e6;
agentOptions.TargetSmoothFactor = 5e-3;
% due to large range of action values, variance needs to be individually
% defined for every action [kp, ki and kd]
% range of kp, ki and kd should be taken into account
% kp =[-6, 6], range = 12
% ki = [-0.2, 0.2], range = 0.4
% kd = [-2, 2], range = 4
% rule states that variance should be var*sqrt(Ts) between 1% to 10% of the
% range
agentOptions.NoiseOptions.MeanAttractionConstant = 0.15;
agentOptions.NoiseOptions.Variance = [0.8, 0.02, 0.2];
%agentOptions.NoiseOptions.Variance = 0.2;
agentOptions.NoiseOptions.VarianceDecayRate = 1e-4;
How do I work around this?
Note: If I only specify one variance it works fine, but the exploration and acheived results is not good
  댓글 수: 3
张 冠宇
张 冠宇 2021년 11월 18일
may i ask how can i get 3 actions like kp ki kd, should i set as follows?
actInfo = rlNumericSpec([3 1]);
or [1 3]
or other settings
as i meet the error
Input data dimensions must match the dimensions specified in the corresponding observation and action info
specifications.
obsInfo = rlNumericSpec([3 1],... % rlNumericSpec:代表连续的动作或观测数据。rlFiniteSetSpec:代表离散的动作或观测数据。
'LowerLimit',[-inf -inf -inf]',...
'UpperLimit',[ inf inf inf]');
obsInfo.Name = 'observations';
obsInfo.Description = 'integrated error, error, and measured height';
numObservations = obsInfo.Dimension(1); % 取观测矩阵的维度
actInfo = rlNumericSpec([3 1]);
actInfo.Name = 'flow';
numActions = actInfo.Dimension(1);
%构建环境接口对象
env = rlSimulinkEnv('Load_Freq_Ctrl_rl2','Load_Freq_Ctrl_rl2/RL Agent',...
obsInfo,actInfo);
%设置自定义重置功能,以随机化模型的参考值。
env.ResetFcn = @(in)localResetFcn(in);
%以秒为单位指定模拟时间Tf和智能体采样时间Ts。
Ts = 0.2;
Tf = 30;
%修复随机生成器种子以提高可重复性。
rng(0)
%创建DDPG智能体
statePath = [
imageInputLayer([numObservations 1 1],'Normalization','none','Name','State')
fullyConnectedLayer(50,'Name','CriticStateFC1')
reluLayer('Name','CriticRelu1')
fullyConnectedLayer(25,'Name','CriticStateFC2')];
actionPath = [
imageInputLayer([3 1 1],'Normalization','none','Name','Action')
fullyConnectedLayer(25,'Name','CriticActionFC1')];
commonPath = [
additionLayer(2,'Name','add')
reluLayer('Name','CriticCommonRelu')
fullyConnectedLayer(1,'Name','CriticOutput')];
criticNetwork = layerGraph();
criticNetwork = addLayers(criticNetwork,statePath);
criticNetwork = addLayers(criticNetwork,actionPath);
criticNetwork = addLayers(criticNetwork,commonPath);
criticNetwork = connectLayers(criticNetwork,'CriticStateFC2','add/in1');
criticNetwork = connectLayers(criticNetwork,'CriticActionFC1','add/in2');
%观察评论者网路的配置。
figure
plot(criticNetwork)
%使用指定评论者表示的选项rlRepresentationOptions。
criticOpts = rlRepresentationOptions('LearnRate',1e-03,'GradientThreshold',1);
critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,'Observation',{'State'},'Action',{'Action'},criticOpts);
%创建critic
actorNetwork = [
imageInputLayer([numObservations 1 1],'Normalization','none','Name','State')
fullyConnectedLayer(3, 'Name','actorFC')
tanhLayer('Name','actorTanh')
fullyConnectedLayer(3,'Name','Action')
];
actorOptions = rlRepresentationOptions('LearnRate',1e-04,'GradientThreshold',1);
actor = rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo,'Observation',{'State'},'Action',{'Action'},actorOptions);
%创建智能体
agentOpts = rlDDPGAgentOptions(...
'SampleTime',Ts,...
'TargetSmoothFactor',1e-3,...
'DiscountFactor',1.0, ...
'MiniBatchSize',64, ...
'ExperienceBufferLength',1e6);
agentOpts.NoiseOptions.Variance = 0.3;
agentOpts.NoiseOptions.VarianceDecayRate = 1e-5;
agent = rlDDPGAgent(actor,critic,agentOpts);
%训练agent
maxepisodes = 5000;
maxsteps = ceil(Tf/Ts);% 'SaveAgentCriteria',"EpisodeReward",'SaveAgentValue',100',
trainOpts = rlTrainingOptions(...
'MaxEpisodes',maxepisodes, ...
'MaxStepsPerEpisode',maxsteps, ...
'ScoreAveragingWindowLength',5, ...
'Verbose',false, ...
'Plots','training-progress',...
'StopTrainingCriteria','EpisodeCount',...
'StopTrainingValue',2000);%155较好
%自己为true
doTraining = true;
trainingStats = train(agent,env,trainOpts);
simOpts = rlSimulationOptions('MaxSteps',maxsteps,'StopOnError','on');
experiences = sim(env,agent,simOpts);
thank you
勇刚 张
勇刚 张 2022년 3월 30일
构造深度网络为什么用imgeInputLayer() 而不用featureLayer() 作为输入层;
actorInfo中得上下限也要像obsInfo中一样重新声明一下,记得用列向量。
Good luck

댓글을 달려면 로그인하십시오.

답변 (0개)

카테고리

Help CenterFile Exchange에서 Reinforcement Learning에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by