필터 지우기
필터 지우기

ddpg agent does not learn

조회 수: 6 (최근 30일)
dani ansari
dani ansari 2023년 7월 22일
답변: awcii 2023년 7월 24일
hi im using a ddpg alghorithm to learn for tuning a pd like controller (transpose jacobian) for tuning its gains.my gains need to be beetween 0.01 and 0.00001 and based on this range i tune my variance : variance*sqrt(sample time) = 10% of range
but my agent does not learn and just see peaks some times but after that it falls to minimum again. i dont know why is this happening.
and the construct of my architectures is:
statepath = [featureInputLayer(numObs , Name = 'stateinp')
fullyConnectedLayer(96,Name = 'stateFC1')
reluLayer
fullyConnectedLayer(74,Name = 'stateFC2')
reluLayer
fullyConnectedLayer(36,Name = 'stateFC3')]
actionpath = [featureInputLayer(numAct, Name = 'actinp')
fullyConnectedLayer(72,Name = 'actFC1')
reluLayer
fullyConnectedLayer(36,Name = 'actFC2')]
commonpath = [additionLayer(2,Name = 'add')
fullyConnectedLayer(96,Name = 'FC1')
reluLayer
fullyConnectedLayer(72,Name = 'FC2')
reluLayer
fullyConnectedLayer(24,Name = 'FC3')
reluLayer
fullyConnectedLayer(1,Name = 'output')]
critic_network = layerGraph()
critic_network = addLayers(critic_network,actionpath)
critic_network = addLayers(critic_network,statepath)
critic_network = addLayers(critic_network,commonpath)
critic_network = connectLayers(critic_network,'actFC2','add/in1')
critic_network = connectLayers(critic_network,'stateFC3','add/in2')
plot(critic_network)
critic = dlnetwork(critic_network)
criticOptions = rlOptimizerOptions('LearnRate',1e-03,'GradientThreshold',1);
critic = rlQValueFunction(critic,obsInfo,actInfo,...
'ObservationInputNames','stateinp','ActionInputNames','actinp');
%% actor
actorNetwork = [featureInputLayer(numObs,Name = 'observation')
fullyConnectedLayer(72,Name = 'actorFC1')
reluLayer
fullyConnectedLayer(48,Name='actorFc2')
reluLayer
fullyConnectedLayer(36,Name='actorFc3')
reluLayer
fullyConnectedLayer(numAct,Name='output')
tanhLayer
scalingLayer(Name = 'actorscaling',scale = max(actInfo.UpperLimit))]
actorNetwork = dlnetwork(actorNetwork);
actorOptions = rlOptimizerOptions('LearnRate',5e-04,'GradientThreshold',1);
actor = rlContinuousDeterministicActor(actorNetwork,obsInfo,actInfo);
%% agent
agentOptions = rlDDPGAgentOptions(...
'SampleTime',0.001,...
'ActorOptimizerOptions',actorOptions,...
'CriticOptimizerOptions',criticOptions,...
'ExperienceBufferLength',1e6,...
'MiniBatchSize',128);
agentOptions.NoiseOptions.StandardDeviation = 0.03;
agentOptions.NoiseOptions.StandardDeviationDecayRate = 1e-5;
agent_MTJ_rl_mobilemanipualtor9 = rlDDPGAgent(actor,critic,agentOption

답변 (2개)

Mrutyunjaya Hiremath
Mrutyunjaya Hiremath 2023년 7월 23일
Check this
% Define the observation and action space
numObs = 4; % Replace with the actual number of observation features
numAct = 2; % Replace with the actual number of action dimensions
% Create the actor network
actorNetwork = [
featureInputLayer(numObs, 'Name', 'observation')
fullyConnectedLayer(72, 'Name', 'actorFC1')
reluLayer
fullyConnectedLayer(48, 'Name', 'actorFC2')
reluLayer
fullyConnectedLayer(36, 'Name', 'actorFC3')
reluLayer
fullyConnectedLayer(numAct, 'Name', 'output')
tanhLayer
scalingLayer('Name', 'actorscaling', 'Scale', max(actInfo.UpperLimit))
];
actorNetwork = dlnetwork(actorNetwork);
% Create the critic network
statePath = [
featureInputLayer(numObs, 'Name', 'stateinp')
fullyConnectedLayer(96, 'Name', 'stateFC1')
reluLayer
fullyConnectedLayer(74, 'Name', 'stateFC2')
reluLayer
fullyConnectedLayer(36, 'Name', 'stateFC3')
];
actionPath = [
featureInputLayer(numAct, 'Name', 'actinp')
fullyConnectedLayer(72, 'Name', 'actFC1')
reluLayer
fullyConnectedLayer(36, 'Name', 'actFC2')
];
commonPath = [
additionLayer(2, 'Name', 'add')
fullyConnectedLayer(96, 'Name', 'FC1')
reluLayer
fullyConnectedLayer(72, 'Name', 'FC2')
reluLayer
fullyConnectedLayer(24, 'Name', 'FC3')
reluLayer
fullyConnectedLayer(1, 'Name', 'output')
];
criticNetwork = layerGraph();
criticNetwork = addLayers(criticNetwork, statePath);
criticNetwork = addLayers(criticNetwork, actionPath);
criticNetwork = addLayers(criticNetwork, commonPath);
criticNetwork = connectLayers(criticNetwork, 'actFC2', 'add/in1');
criticNetwork = connectLayers(criticNetwork, 'stateFC3', 'add/in2');
critic = dlnetwork(criticNetwork);
% Create the actor and critic options
actorOptions = rlRepresentationOptions('Optimizer', rlADAMOptimizer('LearnRate', 5e-4));
criticOptions = rlRepresentationOptions('Optimizer', rlADAMOptimizer('LearnRate', 1e-3));
% Create the actor and critic representations
actor = rlDeterministicActorRepresentation(actorNetwork, obsInfo, actInfo, 'Observation', 'observation', actorOptions);
critic = rlQValueRepresentation(criticNetwork, obsInfo, actInfo, 'Observation', 'stateinp', 'Action', 'actinp', criticOptions);
% Create the DDPG agent
agentOptions = rlDDPGAgentOptions(...
'SampleTime', 0.001,...
'Actor', actor,...
'Critic', critic,...
'ExperienceBufferLength', 1e6,...
'MiniBatchSize', 128);
agentOptions.NoiseOptions.StandardDeviation = 0.03;
agentOptions.NoiseOptions.StandardDeviationDecayRate = 1e-5;
agent = rlDDPGAgent(obsInfo, actInfo, agentOptions);
% Train the agent
trainOpts = rlTrainingOptions(...
'MaxEpisodes', 1000,...
'MaxStepsPerEpisode', 1000,...
'ScoreAveragingWindowLength', 5,...
'Plots', 'training-progress');
trainingStats = train(agent, env, trainOpts);

awcii
awcii 2023년 7월 24일
.

카테고리

Help CenterFile Exchange에서 Policies and Value Functions에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by