DDPG does not converge

조회 수: 2 (최근 30일)
Esan freedom
Esan freedom 2024년 5월 17일
댓글: Alan 2024년 6월 27일
Hello
I am using a DDPG agent that generates 4 continuous actions (2 positive values- 2negative values). The summation of 2 positive action values must be equal to the positive part of a reference value, and the summation of 2 negative action values must be equal to the negative part of the reference value. However, the agent can't learn to track the reference. I have tried different reward functions and hyperparameters, but after a while it always chooses the maximum values of defined action ranges ([-1 -1 1 1]).
Any suggestion I appreciate
open_system(mdl)
obsInfo = rlNumericSpec([2 1]);
obsInfo.Name = 'observations';
numObservations = obsInfo.Dimension(1);
actInfo = rlNumericSpec([4 1],...
LowerLimit=[-1 -1 0 0]',...
UpperLimit=[0 0 1 1]');
numActions = actInfo.Dimension(1);
%Build the environment interface object
agentblk = 'MEMG_RL/RL Agent';
env = rlSimulinkEnv(mdl,agentblk,obsInfo,actInfo);
Ts = 2e-2;
Tf = 60;
statepath = [featureInputLayer(numObservations , Name = 'stateinp')
fullyConnectedLayer(96,Name = 'stateFC1')
reluLayer
fullyConnectedLayer(74,Name = 'stateFC2')
reluLayer
fullyConnectedLayer(36,Name = 'stateFC3')];
actionpath = [featureInputLayer(numActions, Name = 'actinp')
fullyConnectedLayer(72,Name = 'actFC1')
reluLayer
fullyConnectedLayer(36,Name = 'actFC2')];
commonpath = [additionLayer(2,Name = 'add')
fullyConnectedLayer(96,Name = 'FC1')
reluLayer
fullyConnectedLayer(72,Name = 'FC2')
reluLayer
fullyConnectedLayer(24,Name = 'FC3')
reluLayer
fullyConnectedLayer(1,Name = 'output')];
critic_network = layerGraph();
critic_network = addLayers(critic_network,actionpath);
critic_network = addLayers(critic_network,statepath);
critic_network = addLayers(critic_network,commonpath);
critic_network = connectLayers(critic_network,'actFC2','add/in1');
critic_network = connectLayers(critic_network,'stateFC3','add/in2');
plot(critic_network)
critic = dlnetwork(critic_network);
criticOptions = rlOptimizerOptions('LearnRate',3e-04,'GradientThreshold',1);
critic = rlQValueFunction(critic,obsInfo,actInfo,...
'ObservationInputNames','stateinp','ActionInputNames','actinp');
%% actor
actorNetwork = [featureInputLayer(numObservations,Name = 'observation')
fullyConnectedLayer(72,Name = 'actorFC1')
reluLayer
fullyConnectedLayer(48,Name='actorFc2')
reluLayer
fullyConnectedLayer(36,Name='actorFc3')
reluLayer
fullyConnectedLayer(numActions,Name='output')
tanhLayer
scalingLayer(Name = 'actorscaling',scale = max(actInfo.UpperLimit))];
actorNetwork = dlnetwork(actorNetwork);
actorOptions = rlOptimizerOptions('LearnRate',3e-04,'GradientThreshold',1);
actor = rlContinuousDeterministicActor(actorNetwork,obsInfo,actInfo);
%% agent
agentOptions = rlDDPGAgentOptions(...
'SampleTime',Ts,...
'ActorOptimizerOptions',actorOptions,...
'CriticOptimizerOptions',criticOptions,...
'ExperienceBufferLength',1e6,...
'MiniBatchSize',128);
agentOptions.NoiseOptions.StandardDeviation = 0.1; %.07/sqrt(Ts) ;
agentOptions.NoiseOptions.StandardDeviationDecayRate = 1e-6;
maxepisodes = 5000;
maxsteps = ceil(Tf/Ts);
trainOpts = rlTrainingOptions(...
'MaxEpisodes',maxepisodes, ...
'MaxStepsPerEpisode',maxsteps, ...
'ScoreAveragingWindowLength',20, ...
'Verbose',false, ...
'Plots','training-progress',...
'StopTrainingCriteria','EpisodeCount',...
'StopTrainingValue',5000);
agent = rlDDPGAgent(actor,critic,agentOptions);
  댓글 수: 2
Esan freedom
Esan freedom 2024년 5월 20일
@ Emmanouil Tzorakoleftherakis
It learns and get lost again as reward plot shows.
I aapreciate at once
Alan
Alan 2024년 6월 27일
Hi Esan,
Could you provide the .slx file of the Simulink model.
Regards.

댓글을 달려면 로그인하십시오.

답변 (0개)

카테고리

Help CenterFile Exchange에서 Applications에 대해 자세히 알아보기

태그

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by