ddpg agent does not learn

Question

dani ansari 2023년 7월 22일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1999458-ddpg-agent-does-not-learn

답변: awcii 2023년 7월 24일

hi im using a ddpg alghorithm to learn for tuning a pd like controller (transpose jacobian) for tuning its gains.my gains need to be beetween 0.01 and 0.00001 and based on this range i tune my variance : variance*sqrt(sample time) = 10% of range

but my agent does not learn and just see peaks some times but after that it falls to minimum again. i dont know why is this happening.

and the construct of my architectures is:

statepath = [featureInputLayer(numObs , Name = 'stateinp')
     fullyConnectedLayer(96,Name = 'stateFC1')
     reluLayer
     fullyConnectedLayer(74,Name = 'stateFC2')
     reluLayer
     fullyConnectedLayer(36,Name = 'stateFC3')]
actionpath = [featureInputLayer(numAct, Name =  'actinp')
     fullyConnectedLayer(72,Name = 'actFC1')
     reluLayer
     fullyConnectedLayer(36,Name = 'actFC2')]
commonpath = [additionLayer(2,Name = 'add')
     fullyConnectedLayer(96,Name = 'FC1')
     reluLayer
     fullyConnectedLayer(72,Name = 'FC2')
     reluLayer
     fullyConnectedLayer(24,Name = 'FC3')
     reluLayer
     fullyConnectedLayer(1,Name = 'output')]
critic_network = layerGraph()
critic_network = addLayers(critic_network,actionpath)
critic_network = addLayers(critic_network,statepath)
critic_network = addLayers(critic_network,commonpath)
critic_network = connectLayers(critic_network,'actFC2','add/in1')
critic_network = connectLayers(critic_network,'stateFC3','add/in2')
plot(critic_network)
critic = dlnetwork(critic_network)
criticOptions = rlOptimizerOptions('LearnRate',1e-03,'GradientThreshold',1);
critic = rlQValueFunction(critic,obsInfo,actInfo,...
    'ObservationInputNames','stateinp','ActionInputNames','actinp');
%% actor
actorNetwork = [featureInputLayer(numObs,Name = 'observation')
    fullyConnectedLayer(72,Name = 'actorFC1')
    reluLayer
    fullyConnectedLayer(48,Name='actorFc2')
    reluLayer
    fullyConnectedLayer(36,Name='actorFc3')
    reluLayer
    fullyConnectedLayer(numAct,Name='output')
    tanhLayer
    scalingLayer(Name = 'actorscaling',scale = max(actInfo.UpperLimit))]
actorNetwork = dlnetwork(actorNetwork);
actorOptions = rlOptimizerOptions('LearnRate',5e-04,'GradientThreshold',1);
actor = rlContinuousDeterministicActor(actorNetwork,obsInfo,actInfo);
%% agent
agentOptions = rlDDPGAgentOptions(...
    'SampleTime',0.001,...
    'ActorOptimizerOptions',actorOptions,...
    'CriticOptimizerOptions',criticOptions,...
    'ExperienceBufferLength',1e6,...
    'MiniBatchSize',128);
agentOptions.NoiseOptions.StandardDeviation = 0.03;
agentOptions.NoiseOptions.StandardDeviationDecayRate = 1e-5;
agent_MTJ_rl_mobilemanipualtor9 = rlDDPGAgent(actor,critic,agentOption

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Mrutyunjaya Hiremath 2023년 7월 23일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1999458-ddpg-agent-does-not-learn#answer_1277198

MATLAB Online에서 열기

Check this

% Define the observation and action space
numObs = 4; % Replace with the actual number of observation features
numAct = 2; % Replace with the actual number of action dimensions
% Create the actor network
actorNetwork = [
    featureInputLayer(numObs, 'Name', 'observation')
    fullyConnectedLayer(72, 'Name', 'actorFC1')
    reluLayer
    fullyConnectedLayer(48, 'Name', 'actorFC2')
    reluLayer
    fullyConnectedLayer(36, 'Name', 'actorFC3')
    reluLayer
    fullyConnectedLayer(numAct, 'Name', 'output')
    tanhLayer
    scalingLayer('Name', 'actorscaling', 'Scale', max(actInfo.UpperLimit))
];
actorNetwork = dlnetwork(actorNetwork);
% Create the critic network
statePath = [
    featureInputLayer(numObs, 'Name', 'stateinp')
    fullyConnectedLayer(96, 'Name', 'stateFC1')
    reluLayer
    fullyConnectedLayer(74, 'Name', 'stateFC2')
    reluLayer
    fullyConnectedLayer(36, 'Name', 'stateFC3')
];
actionPath = [
    featureInputLayer(numAct, 'Name', 'actinp')
    fullyConnectedLayer(72, 'Name', 'actFC1')
    reluLayer
    fullyConnectedLayer(36, 'Name', 'actFC2')
];
commonPath = [
    additionLayer(2, 'Name', 'add')
    fullyConnectedLayer(96, 'Name', 'FC1')
    reluLayer
    fullyConnectedLayer(72, 'Name', 'FC2')
    reluLayer
    fullyConnectedLayer(24, 'Name', 'FC3')
    reluLayer
    fullyConnectedLayer(1, 'Name', 'output')
];
criticNetwork = layerGraph();
criticNetwork = addLayers(criticNetwork, statePath);
criticNetwork = addLayers(criticNetwork, actionPath);
criticNetwork = addLayers(criticNetwork, commonPath);
criticNetwork = connectLayers(criticNetwork, 'actFC2', 'add/in1');
criticNetwork = connectLayers(criticNetwork, 'stateFC3', 'add/in2');
critic = dlnetwork(criticNetwork);
% Create the actor and critic options
actorOptions = rlRepresentationOptions('Optimizer', rlADAMOptimizer('LearnRate', 5e-4));
criticOptions = rlRepresentationOptions('Optimizer', rlADAMOptimizer('LearnRate', 1e-3));
% Create the actor and critic representations
actor = rlDeterministicActorRepresentation(actorNetwork, obsInfo, actInfo, 'Observation', 'observation', actorOptions);
critic = rlQValueRepresentation(criticNetwork, obsInfo, actInfo, 'Observation', 'stateinp', 'Action', 'actinp', criticOptions);
% Create the DDPG agent
agentOptions = rlDDPGAgentOptions(...
    'SampleTime', 0.001,...
    'Actor', actor,...
    'Critic', critic,...
    'ExperienceBufferLength', 1e6,...
    'MiniBatchSize', 128);
agentOptions.NoiseOptions.StandardDeviation = 0.03;
agentOptions.NoiseOptions.StandardDeviationDecayRate = 1e-5;
agent = rlDDPGAgent(obsInfo, actInfo, agentOptions);
% Train the agent
trainOpts = rlTrainingOptions(...
    'MaxEpisodes', 1000,...
    'MaxStepsPerEpisode', 1000,...
    'ScoreAveragingWindowLength', 5,...
    'Plots', 'training-progress');
trainingStats = train(agent, env, trainOpts);