How can i create a tune PID Controller using Reinforcement Learning

Question

Dario Di Francesco 2021년 12월 7일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1605440-how-can-i-create-a-tune-pid-controller-using-reinforcement-learning

댓글: S Saha 2023년 8월 25일

Hello i try to imporve that example of Tune PI Controller using Reinforcement Learning into a Tune PID Controller using Reinforcement Learning:

https://it.mathworks.com/help/reinforcement-learning/ug/tune-pi-controller-using-td3.html

But it's not work, What was it that I did wrong?

I modified the simulink scheme watertankLQG.slx into a more simple schemecalled System_PID.slx and i create a script to calculate the parameter of PID (Kp-Ki-Kd-N) called ziegler_Nichols.m with techniques of Ziegler and Nichols

%% Esempio Z&N in anello chiuso
clear, close, clc
Ts=0.1;
Tf=30;
s = tf('s');
G= (5*(0.5-s))/(s+2)^3;
a= [1 6 12 8]
b = [-5 2.5]
[A,B,C,D] = tf2ss(b,a)
figure(1)
margin(G)
figure(2)
rlocus(G)
figure (3)
nyquistplot(G), grid on
%% Prova a ciclo chiuso
Kp = 1.9692308;
sysCL = feedback(Kp*G, 1)
figure(4)
margin(sysCL)
figure(5)
rlocus(sysCL)
figure(6)
step(sysCL);
% Osservazione: il diagramma di nyquist passa per il punto critico
% quando Kp = Ku
figure(7)
nyquist(Kp*G), grid on
KU = Kp;
%% Calcolo TU
[y,t] = step(sysCL,1:5e-3:10);
ii = find(abs(diff(y))<3e-5);
figure(8)
plot(t,y,'linewidth',2), hold on, grid on
plot(t(ii),y(ii),'or');
TU = min(diff(t(ii)))*2;
%% Taratura PID
Kp = 0.6*KU;
Ti = TU/2;
Td = TU/8;
Ki = Kp/Ti;
Kd = Kp*Td;
N = 10;
PI = Kp+(Ki/s)+((Kd*s)/(1+s*Td/N));
[y1,t1] = step(sysCL,0:1e-4:30);
sysCL2 = feedback(PI*G,1);
[y2,t2] = step(sysCL2,0:1e-4:30);
figure(9)
subplot(211)
plot(t1,y1,'linewidth',2);
title(['K_U = ' num2str(KU) ', T_U = ' num2str(TU)])
grid on
subplot(212)
plot(t2,y2,'linewidth',2);
title(['K_P = ' num2str(Kp) ', T_I = ' num2str(Ti) ', T_D = ' num2str(Td)])
grid on
figure(10)
margin(sysCL2)
figure(11)
bode(sysCL2)
figure(10)
rlocus(sysCL2)
figure(12)
nyquistplot(sysCL2)
Kp_Z_N=Kp
Ki_Z_N=Ki
Kd_Z_N=Kd
mdlTest = 'System_PID';
open_system(mdlTest);
set_param([mdlTest '/PID Controller'],'P',num2str(Kp_Z_N))
set_param([mdlTest '/PID Controller'],'I',num2str(Ki_Z_N))
set_param([mdlTest '/PID Controller'],'D',num2str(Kd_Z_N))
set_param([mdlTest '/PID Controller'],'N',num2str(N))

I copy and past the algorithm on web page: https://it.mathworks.com/help/reinforcement-learning/ug/tune-pi-controller-using-td3.html and and I made some changes.

I increased the columns of obsInfo = rlNumericSpec([3 1]) and i add these lines to take the parameter of Kd from observation, to define N=10 and to put these values into PID of System_PID

s = tf('s');
G= (5*(0.5-s))/(s+2)^3
a= [1 6 12 8]
b = [-5 2.5]
[A,B,C,D] = tf2ss(b,a)
mdl = 'rl_PID_Tune';
open_system(mdl)
Ts = 0.1;
Tf = 10;
[env,obsInfo,actInfo] = localCreatePIDEnv(mdl);
numObservations = obsInfo.Dimension(1);
numActions = prod(actInfo.Dimension);
rng(0)
initialGain = single([1e-3 2]);
actorNetwork = [
    featureInputLayer(numObservations,'Normalization','none','Name','state')
    fullyConnectedPILayer(initialGain, 'Action')];
actorOptions = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1);
actor = rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo,...
    'Observation',{'state'},'Action',{'Action'},actorOptions);
criticNetwork = localCreateCriticNetwork(numObservations,numActions);
criticOpts = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1);
critic1 = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,...
    'Observation','state','Action','action',criticOpts);
critic2 = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,...
    'Observation','state','Action','action',criticOpts);
critic = [critic1 critic2];
agentOpts = rlTD3AgentOptions(...
    'SampleTime',Ts,...
    'MiniBatchSize',128, ...
    'ExperienceBufferLength',1e6);
agentOpts.ExplorationModel.Variance = 0.1;
agentOpts.TargetPolicySmoothModel.Variance = 0.1;
agent = rlTD3Agent(actor,critic,agentOpts);
maxepisodes = 100;
maxsteps = ceil(Tf/Ts);
trainOpts = rlTrainingOptions(...
    'MaxEpisodes',maxepisodes, ...
    'MaxStepsPerEpisode',maxsteps, ...
    'ScoreAveragingWindowLength',100, ...
    'Verbose',false, ...
    'Plots','training-progress',...
    'StopTrainingCriteria','AverageReward',...
    'StopTrainingValue',-355);
% Train the agent.
trainingStats = train(agent,env,trainOpts);
simOpts = rlSimulationOptions('MaxSteps',maxsteps);
experiences = sim(env,agent,simOpts);
actor = getActor(agent);
parameters = getLearnableParameters(actor);
Ki = abs(parameters{1}(1))
Kp = abs(parameters{1}(2))
Kd = abs(parameters{1}(3))
N = 10; 
mdlTest = 'System_PID';
open_system(mdlTest);
set_param([mdlTest '/PID Controller'],'P',num2str(Kp))
set_param([mdlTest '/PID Controller'],'I',num2str(Ki))
set_param([mdlTest '/PID Controller'],'D',num2str(Kd_Z_N))
set_param([mdlTest '/PID Controller'],'N',num2str(N))
%% local Functions
function [env,obsInfo,actInfo] = localCreatePIDEnv(mdl)
% Define the observation specification obsInfo and action specification actInfo.
obsInfo = rlNumericSpec([3 1]);
obsInfo.Name = 'observations';
obsInfo.Description = 'integrated error and error';
actInfo = rlNumericSpec([1 1]);
actInfo.Name = 'PID output';
% Build the environment interface object.
env = rlSimulinkEnv(mdl,[mdl '/RL Agent'],obsInfo,actInfo);
% Set a cutom reset function that randomizes the reference values for the model.
env.ResetFcn = @(in)localResetFcn(in,mdl);
end
function in = localResetFcn(in,mdl)
% randomize reference signal
blk = sprintf([mdl '/Desired \nValue']);
hRef = 10 + 4*(rand-0.5);
in = setBlockParameter(in,blk,'Value',num2str(hRef));
% randomize initial height
hInit = 0;
blk = [mdl '/block system/System'];
in = setBlockParameter(in,blk,'InitialCondition',num2str(hInit));
end
function criticNetwork = localCreateCriticNetwork(numObservations,numActions)
statePath = [
    featureInputLayer(numObservations,'Normalization','none','Name','state')
    fullyConnectedLayer(32,'Name','fc1')];
actionPath = [
    featureInputLayer(numActions,'Normalization','none','Name','action')
    fullyConnectedLayer(32,'Name','fc2')];
commonPath = [
    concatenationLayer(1,2,'Name','concat')
    reluLayer('Name','reluBody1')
    fullyConnectedLayer(32,'Name','fcBody')
    reluLayer('Name','reluBody2')
    fullyConnectedLayer(1,'Name','qvalue')];
criticNetwork = layerGraph();
criticNetwork = addLayers(criticNetwork,statePath);
criticNetwork = addLayers(criticNetwork,actionPath);
criticNetwork = addLayers(criticNetwork,commonPath);
criticNetwork = connectLayers(criticNetwork,'fc1','concat/in1');
criticNetwork = connectLayers(criticNetwork,'fc2','concat/in2');
end

I modifed the scheme of 'rlwatertankPIDTune' and i called 'rl_PID.m'

But it's not work these is the error messages that the comand Window return to me:

Error using dlnetwork/initialize (line 481)
Invalid network.
Error in dlnetwork (line 218)
                net = initialize(net, dlX{:});
Error in deep.internal.sdk.dag2dlnetwork (line 48)
    dlnet = dlnetwork(lg);
Error in rl.util.createInternalModelFactory (line 15)
                Model = deep.internal.sdk.dag2dlnetwork(Model);
Error in rlDeterministicActorRepresentation (line 86)
Model = rl.util.createInternalModelFactory(Model, Options, ObservationNames, ActionNames, InputSize, OutputSize);
Error in rl_PID (line 24)
actor = rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo,...
Caused by:
    Layer 'Action': Error using 'predict' in layer fullyConnectedPILayer. The function threw an error and could not be executed.
        Error using dlarray/fullyconnect>iValidateWeights (line 221)
        The number of weights (2) for each output feature must match the number of elements (3) in each observation of the input data.
        Error in dlarray/fullyconnect (line 101)
        wdata = iValidateWeights(W, xsize, batchDims);
        Error in fullyConnectedPILayer/predict (line 21)
                    Z = fullyconnect(X, abs(obj.Weights), 0, 'DataFormat','CB');

What does it means?

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기

S Saha 2023년 8월 25일

What is the basis for selecting values for the above Initial Gains for Reinforcement Learning for PID tuning??

Dario Di Francesco 2023년 8월 25일

Sorry guys but you response me a little bit late :D. I solve all these problem and i did it for bachelor thesis 2 years ago! If you are intersted about these work and other work about Control and machine learning technique about controll contact me on these email: difrancescodario95@gmail.com or see my linkedin profile https://www.linkedin.com/in/dario-di-francesco-89390a142/ and i pass you these work as soon as possibile after i traslate it in english. As soon as possibile i will start my website with all my project. See you later!

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

How can i create a tune PID Controller using Reinforcement Learning

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기

답변 (0개)

참고 항목

카테고리

태그

Community Treasure Hunt

How can i create a tune PID Controller using Reinforcement Learning

댓글 수: 4 이전 댓글 2개 표시이전 댓글 2개 숨기기

답변 (0개)

참고 항목

카테고리

태그

Community Treasure Hunt

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기