How can i create a tune PID Controller using Reinforcement Learning

조회 수: 13 (최근 30일)
Dario Di Francesco
Dario Di Francesco 2021년 12월 7일
댓글: S Saha 2023년 8월 25일
Hello i try to imporve that example of Tune PI Controller using Reinforcement Learning into a Tune PID Controller using Reinforcement Learning:
But it's not work, What was it that I did wrong?
I modified the simulink scheme watertankLQG.slx into a more simple schemecalled System_PID.slx and i create a script to calculate the parameter of PID (Kp-Ki-Kd-N) called ziegler_Nichols.m with techniques of Ziegler and Nichols
%% Esempio Z&N in anello chiuso
clear, close, clc
Ts=0.1;
Tf=30;
s = tf('s');
G= (5*(0.5-s))/(s+2)^3;
a= [1 6 12 8]
b = [-5 2.5]
[A,B,C,D] = tf2ss(b,a)
figure(1)
margin(G)
figure(2)
rlocus(G)
figure (3)
nyquistplot(G), grid on
%% Prova a ciclo chiuso
Kp = 1.9692308;
sysCL = feedback(Kp*G, 1)
figure(4)
margin(sysCL)
figure(5)
rlocus(sysCL)
figure(6)
step(sysCL);
% Osservazione: il diagramma di nyquist passa per il punto critico
% quando Kp = Ku
figure(7)
nyquist(Kp*G), grid on
KU = Kp;
%% Calcolo TU
[y,t] = step(sysCL,1:5e-3:10);
ii = find(abs(diff(y))<3e-5);
figure(8)
plot(t,y,'linewidth',2), hold on, grid on
plot(t(ii),y(ii),'or');
TU = min(diff(t(ii)))*2;
%% Taratura PID
Kp = 0.6*KU;
Ti = TU/2;
Td = TU/8;
Ki = Kp/Ti;
Kd = Kp*Td;
N = 10;
PI = Kp+(Ki/s)+((Kd*s)/(1+s*Td/N));
[y1,t1] = step(sysCL,0:1e-4:30);
sysCL2 = feedback(PI*G,1);
[y2,t2] = step(sysCL2,0:1e-4:30);
figure(9)
subplot(211)
plot(t1,y1,'linewidth',2);
title(['K_U = ' num2str(KU) ', T_U = ' num2str(TU)])
grid on
subplot(212)
plot(t2,y2,'linewidth',2);
title(['K_P = ' num2str(Kp) ', T_I = ' num2str(Ti) ', T_D = ' num2str(Td)])
grid on
figure(10)
margin(sysCL2)
figure(11)
bode(sysCL2)
figure(10)
rlocus(sysCL2)
figure(12)
nyquistplot(sysCL2)
Kp_Z_N=Kp
Ki_Z_N=Ki
Kd_Z_N=Kd
mdlTest = 'System_PID';
open_system(mdlTest);
set_param([mdlTest '/PID Controller'],'P',num2str(Kp_Z_N))
set_param([mdlTest '/PID Controller'],'I',num2str(Ki_Z_N))
set_param([mdlTest '/PID Controller'],'D',num2str(Kd_Z_N))
set_param([mdlTest '/PID Controller'],'N',num2str(N))
I copy and past the algorithm on web page: https://it.mathworks.com/help/reinforcement-learning/ug/tune-pi-controller-using-td3.html and and I made some changes.
I increased the columns of obsInfo = rlNumericSpec([3 1]) and i add these lines to take the parameter of Kd from observation, to define N=10 and to put these values into PID of System_PID
s = tf('s');
G= (5*(0.5-s))/(s+2)^3
a= [1 6 12 8]
b = [-5 2.5]
[A,B,C,D] = tf2ss(b,a)
mdl = 'rl_PID_Tune';
open_system(mdl)
Ts = 0.1;
Tf = 10;
[env,obsInfo,actInfo] = localCreatePIDEnv(mdl);
numObservations = obsInfo.Dimension(1);
numActions = prod(actInfo.Dimension);
rng(0)
initialGain = single([1e-3 2]);
actorNetwork = [
featureInputLayer(numObservations,'Normalization','none','Name','state')
fullyConnectedPILayer(initialGain, 'Action')];
actorOptions = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1);
actor = rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo,...
'Observation',{'state'},'Action',{'Action'},actorOptions);
criticNetwork = localCreateCriticNetwork(numObservations,numActions);
criticOpts = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1);
critic1 = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,...
'Observation','state','Action','action',criticOpts);
critic2 = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,...
'Observation','state','Action','action',criticOpts);
critic = [critic1 critic2];
agentOpts = rlTD3AgentOptions(...
'SampleTime',Ts,...
'MiniBatchSize',128, ...
'ExperienceBufferLength',1e6);
agentOpts.ExplorationModel.Variance = 0.1;
agentOpts.TargetPolicySmoothModel.Variance = 0.1;
agent = rlTD3Agent(actor,critic,agentOpts);
maxepisodes = 100;
maxsteps = ceil(Tf/Ts);
trainOpts = rlTrainingOptions(...
'MaxEpisodes',maxepisodes, ...
'MaxStepsPerEpisode',maxsteps, ...
'ScoreAveragingWindowLength',100, ...
'Verbose',false, ...
'Plots','training-progress',...
'StopTrainingCriteria','AverageReward',...
'StopTrainingValue',-355);
% Train the agent.
trainingStats = train(agent,env,trainOpts);
simOpts = rlSimulationOptions('MaxSteps',maxsteps);
experiences = sim(env,agent,simOpts);
actor = getActor(agent);
parameters = getLearnableParameters(actor);
Ki = abs(parameters{1}(1))
Kp = abs(parameters{1}(2))
Kd = abs(parameters{1}(3))
N = 10;
mdlTest = 'System_PID';
open_system(mdlTest);
set_param([mdlTest '/PID Controller'],'P',num2str(Kp))
set_param([mdlTest '/PID Controller'],'I',num2str(Ki))
set_param([mdlTest '/PID Controller'],'D',num2str(Kd_Z_N))
set_param([mdlTest '/PID Controller'],'N',num2str(N))
%% local Functions
function [env,obsInfo,actInfo] = localCreatePIDEnv(mdl)
% Define the observation specification obsInfo and action specification actInfo.
obsInfo = rlNumericSpec([3 1]);
obsInfo.Name = 'observations';
obsInfo.Description = 'integrated error and error';
actInfo = rlNumericSpec([1 1]);
actInfo.Name = 'PID output';
% Build the environment interface object.
env = rlSimulinkEnv(mdl,[mdl '/RL Agent'],obsInfo,actInfo);
% Set a cutom reset function that randomizes the reference values for the model.
env.ResetFcn = @(in)localResetFcn(in,mdl);
end
function in = localResetFcn(in,mdl)
% randomize reference signal
blk = sprintf([mdl '/Desired \nValue']);
hRef = 10 + 4*(rand-0.5);
in = setBlockParameter(in,blk,'Value',num2str(hRef));
% randomize initial height
hInit = 0;
blk = [mdl '/block system/System'];
in = setBlockParameter(in,blk,'InitialCondition',num2str(hInit));
end
function criticNetwork = localCreateCriticNetwork(numObservations,numActions)
statePath = [
featureInputLayer(numObservations,'Normalization','none','Name','state')
fullyConnectedLayer(32,'Name','fc1')];
actionPath = [
featureInputLayer(numActions,'Normalization','none','Name','action')
fullyConnectedLayer(32,'Name','fc2')];
commonPath = [
concatenationLayer(1,2,'Name','concat')
reluLayer('Name','reluBody1')
fullyConnectedLayer(32,'Name','fcBody')
reluLayer('Name','reluBody2')
fullyConnectedLayer(1,'Name','qvalue')];
criticNetwork = layerGraph();
criticNetwork = addLayers(criticNetwork,statePath);
criticNetwork = addLayers(criticNetwork,actionPath);
criticNetwork = addLayers(criticNetwork,commonPath);
criticNetwork = connectLayers(criticNetwork,'fc1','concat/in1');
criticNetwork = connectLayers(criticNetwork,'fc2','concat/in2');
end
I modifed the scheme of 'rlwatertankPIDTune' and i called 'rl_PID.m'
But it's not work these is the error messages that the comand Window return to me:
Error using dlnetwork/initialize (line 481)
Invalid network.
Error in dlnetwork (line 218)
net = initialize(net, dlX{:});
Error in deep.internal.sdk.dag2dlnetwork (line 48)
dlnet = dlnetwork(lg);
Error in rl.util.createInternalModelFactory (line 15)
Model = deep.internal.sdk.dag2dlnetwork(Model);
Error in rlDeterministicActorRepresentation (line 86)
Model = rl.util.createInternalModelFactory(Model, Options, ObservationNames, ActionNames, InputSize, OutputSize);
Error in rl_PID (line 24)
actor = rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo,...
Caused by:
Layer 'Action': Error using 'predict' in layer fullyConnectedPILayer. The function threw an error and could not be executed.
Error using dlarray/fullyconnect>iValidateWeights (line 221)
The number of weights (2) for each output feature must match the number of elements (3) in each observation of the input data.
Error in dlarray/fullyconnect (line 101)
wdata = iValidateWeights(W, xsize, batchDims);
Error in fullyConnectedPILayer/predict (line 21)
Z = fullyconnect(X, abs(obj.Weights), 0, 'DataFormat','CB');
What does it means?
  댓글 수: 4
S Saha
S Saha 2023년 8월 25일
What is the basis for selecting values for the above Initial Gains for Reinforcement Learning for PID tuning??
Dario Di Francesco
Dario Di Francesco 2023년 8월 25일
Sorry guys but you response me a little bit late :D. I solve all these problem and i did it for bachelor thesis 2 years ago! If you are intersted about these work and other work about Control and machine learning technique about controll contact me on these email: difrancescodario95@gmail.com or see my linkedin profile https://www.linkedin.com/in/dario-di-francesco-89390a142/ and i pass you these work as soon as possibile after i traslate it in english. As soon as possibile i will start my website with all my project. See you later!

댓글을 달려면 로그인하십시오.

답변 (0개)

카테고리

Help CenterFile Exchange에서 Reinforcement Learning에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by