RL DDPG agent does not seem to learn, aircraft control problem

Question

Leonardo Molino 2024년 8월 2일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2142506-rl-ddpg-agent-does-not-seem-to-learn-aircraft-control-problem

댓글: Leonardo Molino 2024년 8월 6일

Hello everyone,

I’m back with some updates on my mixed Reinforcement Learning (RL) and Supervised Learning training. A few days ago, I posted a question here on MathWorks about the working principle of “external actions” in the RL training block. Based on the suggestions I received, I have started a hybrid training approach.

I begin by injecting external actions from the controller for 75 seconds (1/4 of the entire episode length). After this, the agent takes action until the pitch rate error reaches 5 degrees per second. When this threshold is reached, the external agent takes control again. The external actions are then cut off when the pitch rate is very close to 0 degrees per second for about 40 seconds. The agent then takes control again, and this cycle continues.

I have also introduced a maximum number of allowed interventions. If the agent exceeds this threshold, the simulation stops and a penalty is applied. I also apply a penalty every time the external controller must intervene again, while a bonus is given every time the agent makes progress within the time window when it is left alone. This system of bonuses and penalties is added to the standard reward, which takes into account the altitude error, the flight path angle error, and the pitch rate error. The weight coefficients for these errors are 1, 1, and 10, respectively, because I want to emphasize that the aircraft must maintain level wings.

The initial conditions are always random, and the setpoint for altitude is always set 50 meters above the initial altitude.

Unfortunately, after the first training session, I haven’t seen any progress. According to your opinion, is it worth taking another attempt or is the whole setup wrong? Thank you.

댓글 수: 6
이전 댓글 4개 표시이전 댓글 4개 숨기기

Leonardo Molino 2024년 8월 4일

MATLAB Online에서 열기

Hi @Umar, thanks for answering my question. Is it actually possible to implement such a training loop in MATLAB? Anyway my agent setup is the following

% Build and configure the agent
sample_time          = 0.1; %(s)
delta_e_action_range = abs(delta_e_LL) + delta_e_UL;
delta_e_std_dev      = (0.08*delta_e_action_range)/sqrt(sample_time)
delta_T_action_range = abs(delta_T_LL) + delta_T_UL;
delta_T_std_dev      = (0.08*delta_T_action_range)/sqrt(sample_time)
std_dev_decayrate = 1e-6;
create_new_agent = false;
if create_new_agent
    new_agent_opt = rlDDPGAgentOptions;
    new_agent_opt.SampleTime                                 = sample_time;
    new_agent_opt.NoiseOptions.StandardDeviation             = [delta_e_std_dev; delta_T_std_dev];
    new_agent_opt.NoiseOptions.StandardDeviationDecayRate    = std_dev_decayrate;
    new_agent_opt.ExperienceBufferLength                     = 1e6;
    new_agent_opt.MiniBatchSize                              = 256;
    new_agent_opt.ResetExperienceBufferBeforeTraining        = create_new_agent;
    Alt_STEP_Agent = rlDDPGAgent(obsInfo, actInfo, new_agent_opt)
    % get the actor
    actor           = getActor(Alt_STEP_Agent);
    actorNet        = getModel(actor);
    actorLayers     = actorNet.Layers;
    % configure the learning
    learnOptions = rlOptimizerOptions("LearnRate",1e-06,"GradientThreshold",1);
    actor.UseDevice = 'cpu';
    new_agent_opt.ActorOptimizerOptions = learnOptions;
    % get the critic
    critic          = getCritic(Alt_STEP_Agent);
    criticNet       = getModel(critic);
    criticLayers    = criticNet.Layers;
    % configure the critic
    critic.UseDevice = 'gpu';
    new_agent_opt.CriticOptimizerOptions = learnOptions;
    Alt_STEP_Agent = rlDDPGAgent(actor, critic, new_agent_opt);
else
    load('Train2_Agent450.mat')
    previously_trained_agent = saved_agent;
    actor    = getActor(previously_trained_agent);
    actorNet = getModel(actor);
    critic    = getCritic(previously_trained_agent);
    criticNet = getModel(critic);
end

Umar 2024년 8월 6일

Hi @ Leonardo Molino,

My suggestion would be start by checking the learning rates, network complexity, and the quality of the training data. Additionally, monitor the loss curves and rewards during training which help you provide insights into the model's performance. Hope this helps.

Leonardo Molino 2024년 8월 6일

@Umar thank you, I will try

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

RL DDPG agent does not seem to learn, aircraft control problem

댓글 수: 6
이전 댓글 4개 표시이전 댓글 4개 숨기기

답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

RL DDPG agent does not seem to learn, aircraft control problem

댓글 수: 6 이전 댓글 4개 표시이전 댓글 4개 숨기기

답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 6
이전 댓글 4개 표시이전 댓글 4개 숨기기