ddpg agent does not learn

조회 수: 10 (최근 30일)
dani ansari
dani ansari 2023년 7월 22일
댓글: Harold 2025년 3월 31일 6:50
hi im using a ddpg alghorithm to learn for tuning a pd like controller (transpose jacobian) for tuning its gains.my gains need to be beetween 0.01 and 0.00001 and based on this range i tune my variance : variance*sqrt(sample time) = 10% of range
but my agent does not learn and just see peaks some times but after that it falls to minimum again. i dont know why is this happening.
and the construct of my architectures is:
statepath = [featureInputLayer(numObs , Name = 'stateinp')
fullyConnectedLayer(96,Name = 'stateFC1')
reluLayer
fullyConnectedLayer(74,Name = 'stateFC2')
reluLayer
fullyConnectedLayer(36,Name = 'stateFC3')]
actionpath = [featureInputLayer(numAct, Name = 'actinp')
fullyConnectedLayer(72,Name = 'actFC1')
reluLayer
fullyConnectedLayer(36,Name = 'actFC2')]
commonpath = [additionLayer(2,Name = 'add')
fullyConnectedLayer(96,Name = 'FC1')
reluLayer
fullyConnectedLayer(72,Name = 'FC2')
reluLayer
fullyConnectedLayer(24,Name = 'FC3')
reluLayer
fullyConnectedLayer(1,Name = 'output')]
critic_network = layerGraph()
critic_network = addLayers(critic_network,actionpath)
critic_network = addLayers(critic_network,statepath)
critic_network = addLayers(critic_network,commonpath)
critic_network = connectLayers(critic_network,'actFC2','add/in1')
critic_network = connectLayers(critic_network,'stateFC3','add/in2')
plot(critic_network)
critic = dlnetwork(critic_network)
criticOptions = rlOptimizerOptions('LearnRate',1e-03,'GradientThreshold',1);
critic = rlQValueFunction(critic,obsInfo,actInfo,...
'ObservationInputNames','stateinp','ActionInputNames','actinp');
%% actor
actorNetwork = [featureInputLayer(numObs,Name = 'observation')
fullyConnectedLayer(72,Name = 'actorFC1')
reluLayer
fullyConnectedLayer(48,Name='actorFc2')
reluLayer
fullyConnectedLayer(36,Name='actorFc3')
reluLayer
fullyConnectedLayer(numAct,Name='output')
tanhLayer
scalingLayer(Name = 'actorscaling',scale = max(actInfo.UpperLimit))]
actorNetwork = dlnetwork(actorNetwork);
actorOptions = rlOptimizerOptions('LearnRate',5e-04,'GradientThreshold',1);
actor = rlContinuousDeterministicActor(actorNetwork,obsInfo,actInfo);
%% agent
agentOptions = rlDDPGAgentOptions(...
'SampleTime',0.001,...
'ActorOptimizerOptions',actorOptions,...
'CriticOptimizerOptions',criticOptions,...
'ExperienceBufferLength',1e6,...
'MiniBatchSize',128);
agentOptions.NoiseOptions.StandardDeviation = 0.03;
agentOptions.NoiseOptions.StandardDeviationDecayRate = 1e-5;
agent_MTJ_rl_mobilemanipualtor9 = rlDDPGAgent(actor,critic,agentOption

답변 (2개)

Mrutyunjaya Hiremath
Mrutyunjaya Hiremath 2023년 7월 23일
Check this
% Define the observation and action space
numObs = 4; % Replace with the actual number of observation features
numAct = 2; % Replace with the actual number of action dimensions
% Create the actor network
actorNetwork = [
featureInputLayer(numObs, 'Name', 'observation')
fullyConnectedLayer(72, 'Name', 'actorFC1')
reluLayer
fullyConnectedLayer(48, 'Name', 'actorFC2')
reluLayer
fullyConnectedLayer(36, 'Name', 'actorFC3')
reluLayer
fullyConnectedLayer(numAct, 'Name', 'output')
tanhLayer
scalingLayer('Name', 'actorscaling', 'Scale', max(actInfo.UpperLimit))
];
actorNetwork = dlnetwork(actorNetwork);
% Create the critic network
statePath = [
featureInputLayer(numObs, 'Name', 'stateinp')
fullyConnectedLayer(96, 'Name', 'stateFC1')
reluLayer
fullyConnectedLayer(74, 'Name', 'stateFC2')
reluLayer
fullyConnectedLayer(36, 'Name', 'stateFC3')
];
actionPath = [
featureInputLayer(numAct, 'Name', 'actinp')
fullyConnectedLayer(72, 'Name', 'actFC1')
reluLayer
fullyConnectedLayer(36, 'Name', 'actFC2')
];
commonPath = [
additionLayer(2, 'Name', 'add')
fullyConnectedLayer(96, 'Name', 'FC1')
reluLayer
fullyConnectedLayer(72, 'Name', 'FC2')
reluLayer
fullyConnectedLayer(24, 'Name', 'FC3')
reluLayer
fullyConnectedLayer(1, 'Name', 'output')
];
criticNetwork = layerGraph();
criticNetwork = addLayers(criticNetwork, statePath);
criticNetwork = addLayers(criticNetwork, actionPath);
criticNetwork = addLayers(criticNetwork, commonPath);
criticNetwork = connectLayers(criticNetwork, 'actFC2', 'add/in1');
criticNetwork = connectLayers(criticNetwork, 'stateFC3', 'add/in2');
critic = dlnetwork(criticNetwork);
% Create the actor and critic options
actorOptions = rlRepresentationOptions('Optimizer', rlADAMOptimizer('LearnRate', 5e-4));
criticOptions = rlRepresentationOptions('Optimizer', rlADAMOptimizer('LearnRate', 1e-3));
% Create the actor and critic representations
actor = rlDeterministicActorRepresentation(actorNetwork, obsInfo, actInfo, 'Observation', 'observation', actorOptions);
critic = rlQValueRepresentation(criticNetwork, obsInfo, actInfo, 'Observation', 'stateinp', 'Action', 'actinp', criticOptions);
% Create the DDPG agent
agentOptions = rlDDPGAgentOptions(...
'SampleTime', 0.001,...
'Actor', actor,...
'Critic', critic,...
'ExperienceBufferLength', 1e6,...
'MiniBatchSize', 128);
agentOptions.NoiseOptions.StandardDeviation = 0.03;
agentOptions.NoiseOptions.StandardDeviationDecayRate = 1e-5;
agent = rlDDPGAgent(obsInfo, actInfo, agentOptions);
% Train the agent
trainOpts = rlTrainingOptions(...
'MaxEpisodes', 1000,...
'MaxStepsPerEpisode', 1000,...
'ScoreAveragingWindowLength', 5,...
'Plots', 'training-progress');
trainingStats = train(agent, env, trainOpts);

awcii
awcii 2023년 7월 24일
.
  댓글 수: 1
Harold
Harold 2025년 3월 31일 6:50
@awciihill climb racing Bonjour, de quoi souhaitez-vous discuter ? ou je me demande encore où. Veuillez être clair sur les problèmes que vous rencontrez. Si cela correspond à ma compréhension, je suis prêt à vous aider.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Reinforcement Learning에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by