RL agent does not learn properly

조회 수: 21 (최근 30일)
Franz Schnyder
Franz Schnyder 2023년 3월 20일
댓글: Franz Schnyder 2023년 8월 17일
Hello together
I am trying to learn about the Reinforcement Learning Toolbox and want to control the speed of a DC motor using an RL agent to replace a PI controller. I have oriented myself to the example of the water tank. However, I am having some problems learning.
First, I have problems with the agent adjusting itself to arrive at either minimum (0rpm) or maximum(6000rpm) and then not changing its state, even though it was able to achieve a good reward in its own episodes before.
In my reward function, I have specified the error of the target and actual speed as a percentage. When I try to punish him in the reward function so that he doesn't stay at 0rpm anymore, he stays at 0rpm and doesn't try to explore the area. I also have trouble correcting the remaining control error.
In the following the code and some pictures
close all
%param
R = 7.03;
L = 1.04*10^-3;
J = 44.2*10^-7;
a = 2.45*10^-6;
Kn = 250*2*pi/60;
Km = 38.2*10^-3;
actInfo = rlNumericSpec([1 1],'LowerLimit', 0, 'UpperLimit', 24);
actInfo.Name = 'spannung';
obsInfo = rlNumericSpec([3 1],...
'LowerLimit',[-inf -inf -inf ]',...
'UpperLimit',[ inf inf inf]');
obsInfo.Name = 'observations';
obsInfo.Description = 'integrated error, error, and measured rpm';
env=rlSimulinkEnv("DCMotorRL2", 'DCMotorRL2/RL Agent',...
obsInfo,actInfo);
env.ResetFcn = @(in)localResetFcn(in);
Ts = 0.1; %agent sample time
Tf = 20; %simulation time
rng(0)
statePath = [
featureInputLayer(obsInfo.Dimension(1),Name="netObsIn")
fullyConnectedLayer(50)
reluLayer
fullyConnectedLayer(25,Name="CriticStateFC2")];
actionPath = [
featureInputLayer(actInfo.Dimension(1),Name="netActIn")
fullyConnectedLayer(25,Name="CriticActionFC1")];
commonPath = [
additionLayer(2,Name="add")
reluLayer
fullyConnectedLayer(1,Name="CriticOutput")];
criticNetwork = layerGraph();
criticNetwork = addLayers(criticNetwork,statePath);
criticNetwork = addLayers(criticNetwork,actionPath);
criticNetwork = addLayers(criticNetwork,commonPath);
criticNetwork = connectLayers(criticNetwork, ...
"CriticStateFC2", ...
"add/in1");
criticNetwork = connectLayers(criticNetwork, ...
"CriticActionFC1", ...
"add/in2");
criticNetwork = dlnetwork(criticNetwork);
figure
plot(criticNetwork)
critic = rlQValueFunction(criticNetwork,obsInfo,actInfo, ...
ObservationInputNames="netObsIn", ...
ActionInputNames="netActIn");
actorNetwork = [
featureInputLayer(obsInfo.Dimension(1))
fullyConnectedLayer(9) %3
tanhLayer
fullyConnectedLayer(actInfo.Dimension(1))
];
actorNetwork = dlnetwork(actorNetwork);
actor = rlContinuousDeterministicActor(actorNetwork,obsInfo,actInfo);
agent = rlDDPGAgent(actor,critic);
agent.SampleTime = Ts;
agent.AgentOptions.TargetSmoothFactor = 1e-3;
agent.AgentOptions.DiscountFactor = 1.0;
agent.AgentOptions.MiniBatchSize = 64;
agent.AgentOptions.ExperienceBufferLength = 1e6;
agent.AgentOptions.NoiseOptions.Variance = 0.8; %0.3
agent.AgentOptions.NoiseOptions.VarianceDecayRate = 1e-5; %-5
agent.AgentOptions.CriticOptimizerOptions.LearnRate = 1e-03;
agent.AgentOptions.CriticOptimizerOptions.GradientThreshold = 1;
agent.AgentOptions.ActorOptimizerOptions.LearnRate = 1e-04;
agent.AgentOptions.ActorOptimizerOptions.GradientThreshold = 1;
trainOpts = rlTrainingOptions(...
MaxEpisodes=4000, ...
MaxStepsPerEpisode=ceil(Tf/Ts), ...
ScoreAveragingWindowLength=20, ...
Verbose=false, ...
Plots="training-progress",...
StopTrainingCriteria="AverageReward",...
StopTrainingValue=800,...
SaveAgentCriteria="EpisodeCount", ...
SaveAgentValue=600);
doTraining = true;
if doTraining
% Train the agent.
trainingStats = train(agent,env,trainOpts);
end
function in = localResetFcn(in)
% randomize reference signal
blk = sprintf('DCMotorRL2/omega_ref');
h=randi([2000,4000]);
in = setBlockParameter(in,blk,'Value',num2str(h));
%initial 1/min
% h=randi([2000,4000])*(2*pi)/60;
% blk = 'DCMotorRL2/DCMotor/Integrator1';
% in = setBlockParameter(in,blk,'InitialCondition',num2str(h));
end
  댓글 수: 2
awcii
awcii 2023년 8월 17일
Did you solve your problme ?
Franz Schnyder
Franz Schnyder 2023년 8월 17일
Yes, with the increase of the variance of the noise it got better. But in general I had to change many other settings like the observations and the neural networks for a more or less satisfying result. Finally I had the problem that the agent in the simulation oscillated slightly around the setpoint and this had too strong an influence on the real setup.

댓글을 달려면 로그인하십시오.

채택된 답변

Emmanouil Tzorakoleftherakis
Emmanouil Tzorakoleftherakis 2023년 3월 20일
Some comments:
1) 150 episodes is really not much, you need to let the training continue for a bit longer
2) There is no guarantee that the reward will always go up. It may go down as the agent explores and then it may be able to find a better policy along the way
3) Noise variance is critical with DDPG agent. Make sure this value is between 1-10% of your action range
4) Sample time of 0.1 seconds seems a bit too large for a motor control application
5) This example is doing FOC with RL, but you may be able to use it for general information:

추가 답변 (0개)

제품


릴리스

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by