Although I adjusted the Noise Options DDPG actions are always equal to the maximum and minimum value.

Question

Petre Ricioppo 2023년 3월 10일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1926610-although-i-adjusted-the-noise-options-ddpg-actions-are-always-equal-to-the-maximum-and-minimum-value

댓글: Emmanouil Tzorakoleftherakis 2023년 3월 15일

During training, the agent action is always equal to the minimum and maximum value. So the training seems to learn nothing.

The model used is the Ackermann Kinematics Model in Simulink. The main code it's the next one:

%% Define Environment Limits
% Room of 10x10 meters. Each point is one centimeter.                                                                           
TopWall = [1 1000 1000 1 1; 980 980 1000 1000 980];                        % Top wall with 20cm of safety
LeftWall = [1 20 20 1 1; 1 1 1000 1000 1];                                 % Left wall with 20cm of safety
BottomWall = [1 1000 1000 1 1; 1 1 20 20 1];                               % Bottom wall with 20cm of safety
% The goal is the right wall
GoalLine = [990 1000 1000 990 990; 1 1 1000 1000 1];                       % Goal line with 10cm of treshold
GoalPoint = [1000, 500];                                                   % Goal point at the center of the goal line
%% Define Robot Dimensions
h = 50;                                                                    % Length of 50cm 
w = 20;                                                                    % Width of 20cm
%% Define Environment RL
obsInfo = rlNumericSpec([6 1]);
actInfo = rlNumericSpec([2 1],"UpperLimit",[10; 0.23],"LowerLimit",[0; -0.23]);
% actInfo = rlFiniteSetSpec({[10 0], [10 0.3*0.215], [10 -0.3*0.215], [10 0.215], ...
%     [10 -0.215], [0.3*10 0.3*0.215], [0.3*10 -0.3*0.215], [0.3*10 0.215], [0.3*10 -0.215], ...
%    [0 0.215], [0 -0.215]});
env = rlSimulinkEnv("Simple_Model","Simple_Model/RlController",obsInfo,actInfo);
env.ResetFcn = @randomstart;
%% Define Agent
actnet = [featureInputLayer(6,"Name","obs")
    fullyConnectedLayer(100,"Name","fc1")
    reluLayer("Name","relu1")
    fullyConnectedLayer(100,"Name","fc2")
    reluLayer("Name","relu2")
    fullyConnectedLayer(2,"Name","act")
    tanhLayer("Name","scacttanh")
    scalingLayer("Name","scact","Scale",[5; 0.23],"Bias",[5; 0])];
optsActor = rlRepresentationOptions("LearnRate",0.0005,"GradientThreshold",10);
actor = rlDeterministicActorRepresentation(actnet,obsInfo,actInfo, ...
    "Observation","obs","Action","scact",optsActor);
obsPath = [featureInputLayer(6,"Name","obs")
    fullyConnectedLayer(50,"Name","fc1")
    reluLayer("Name","relu1")
    additionLayer(2,"Name","add")
    fullyConnectedLayer(50,"Name","fc3")
    reluLayer("Name","relu3")
    fullyConnectedLayer(1,"Name","value")];
actPath = [featureInputLayer(2,"Name","act")
    fullyConnectedLayer(50,"Name","fcact")];
valnet = layerGraph(obsPath);
valnet = addLayers(valnet,actPath);
valnet = connectLayers(valnet,"fcact","add/in2");
optsCritic = rlRepresentationOptions("LearnRate",0.001,"GradientThreshold",10);
critic = rlQValueRepresentation(valnet,obsInfo,actInfo, ...
    "Observation","obs","Action","act",optsCritic);
dt = 0.25;
opts = rlDDPGAgentOptions("SampleTime",dt,"ExperienceBufferLength", ...
    1e5,"MiniBatchSize",128);
opt.NoiseOptions.StandardDeviation = [0.04 0.0092];
opt.NoiseOptions.StandardDeviationDecayRate = 1e-4;
agent = rlDDPGAgent(actor,critic,opts);
%% Train Agent
clear MyRobotPlot
opts = rlTrainingOptions( ...
        "MaxStepsPerEpisode",250, ... 
        "MaxEpisodes",50000, ...
        "ScoreAveragingWindowLength",50, ...
        "StopTrainingCriteria","AverageReward", ...
        "StopTrainingValue",5000);
info = train(agent,env,opts);

Note as I already set the Standard Deviation as 1% of the action range!

The reward function it's based on the distance between the Goal and the Agent. The observations are the output of the Ackermann Kinematics Model block.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Emmanouil Tzorakoleftherakis 2023년 3월 13일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1926610-although-i-adjusted-the-noise-options-ddpg-actions-are-always-equal-to-the-maximum-and-minimum-value#answer_1191955

At first glance I don't see anything wrong. A couple of suggestions:

1) Try reducing the noise variance further, until you see that actions are not fully saturated

2) Use thedefault agent constructor to make sure this is not an issue with how you designed your actor/critic

댓글 수: 2
없음 표시없음 숨기기

Petre Ricioppo 2023년 3월 15일

Hi Emmanouil, by reducing the noise variance I got better results.

But now the average reward increases in the first 1000 episodes and after having reached a maximum value which still does NOT give acceptable results, it would seem to no longer increase for several thousand episodes. I've already tried to reduce the learning rate of both actor and critic.

Could it be that the agent doesn't explore enough? If yes, how can I improve it?

Emmanouil Tzorakoleftherakis 2023년 3월 15일

Now that's a separate question with no easy answer :). There could be a lot of things to blame if you are not getting good results. Note however that the episode reward you see in the episode manager is not expected to always increase monotonically. The agent may explore an area that leads to lower rewards before reaching a peak again.

There is a somewhat intuitive way to ensure your agent is exploring: 1) make sure the decay rate of your noise model is zero, so that there will always be some amout of exploration, 2) and the plot the actual actions generated by the agent for a few episodes to see if you are getting good coverage of your action space

댓글을 달려면 로그인하십시오.

Although I adjusted the Noise Options DDPG actions are always equal to the maximum and minimum value.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 2
없음 표시없음 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Although I adjusted the Noise Options DDPG actions are always equal to the maximum and minimum value.

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 2 없음 표시없음 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 2
없음 표시없음 숨기기