Training DDPG agent with custom training loop

조회 수: 11 (최근 30일)
平成
平成 2025년 5월 31일
답변: Hitesh 2025년 6월 3일
Currently, I am designing a control system using deep reinforcement learning (DDPG) in reinforcement learning toolbox, MATLAB/Simulink. Specifically, I need to implement a custom training loop that does not rely on train functon. Could you please show me how to implement a custom training loop for training a DDPG agent? I would like to understand how to implement a standard DDPG-based control system using a custom training loop in MATLAB.
I will now provide the MATLAB code I currently use train function for a DDPG agent. Could you convert it into a version that uses a custom training loop (without using train)?
obsInfo = rlNumericSpec([6 1]);
obsInfo.Name = "observations";
actInfo = rlNumericSpec([1 1]);
actInfo.Name = "control input";
mdl ='SIM_RL'; % Simulink model by Plant + RL agent block
env = rlSimulinkEnv( ...
"SIM_RL", ...
"SIM_RL/Agent/RL Agent", ...
obsInfo, actInfo);
% Domain randomization: Reset function
env.ResetFcn = @(in)localResetFcn(in);
function in = localResetFcn(in)
% Fixed range of plant parameter
M_min = Nominal_value*(1 - 0.5); % -50% of nominal mass
M_max = Nominal_value*(1 + 0.5); % +50% of nominal mass
% Randomize mass
randomValue_M = M_min + (M_max - M_min) * rand;
in = setBlockParameter(in, ...
"SIM_RL/Plant/Mass", ...
Value=num2str(randomValue_M));
end
% The construction of the critic Network structure is omitted here.
% ....
criticNet = initialize(criticNet);
critic = rlQValueFunction(criticNet,obsInfo,actInfo);
% The construction of the actor Network structure is omitted here.
% ....
actorNet = initialize(actorNet);
actor = rlContinuousDeterministicActor(actorNet,obsInfo,actInfo);
% Set-up agent
criticOpts = rlOptimizerOptions(LearnRate=1e-04,GradientThreshold=1);
actorOpts = rlOptimizerOptions(LearnRate=1e-04,GradientThreshold=1);
agentOpts = rlDDPGAgentOptions(...
SampleTime=0.01,...
CriticOptimizerOptions=criticOpts,...
ActorOptimizerOptions=actorOpts,...
ExperienceBufferLength=1e5,...
DiscountFactor=0.99,...
MiniBatchSize=128,...
TargetSmoothFactor=1e-3);
agent = rlDDPGAgent(actor,critic,agentOpts);
maxepisodes = 5000;
maxsteps = ceil(Simulation_End_Time/0.01);
trainOpts = rlTrainingOptions(...
MaxEpisodes=maxepisodes,...
MaxStepsPerEpisode=maxsteps,...
ScoreAveragingWindowLength=5,...
Verbose=true,...
Plots="training-progress",...
StopTrainingCriteria="EpisodeCount",...
SaveAgentCriteria="EpisodeReward",...
SaveAgentValue=-1.0);
doTraining = true;
if doTraining
evaluator = rlEvaluator(...
NumEpisodes=1,...
EvaluationFrequency=5);
% Train the agent.
trainingStats = train(agent,env,trainOpts,Evaluator=evaluator);
else
% Load the pretrained agent
load("agent.mat","agent")
end

답변 (1개)

Hitesh
Hitesh 2025년 6월 3일
Hi 平成,
To convert DDPG agent training setup from using the "train" function into a custom training loop in MATLAB. The custom loop gives you greater control over training, evaluation, logging, and integration with domain randomization.
Main Components of a Custom Training Loop are:
  • Environment Reset: Start each episode by resetting the environment.
  • Action Selection: Use the actor network to select an action based on the current observation.
  • Environment Step: Apply the action to the environment (e.g., via sim for Simulink models) and collect the next observation, reward, and done flag.
  • Experience Storage: Store the transition (state, action, reward, next state, done) in a replay buffer.
  • Learning: Sample mini-batches from the buffer and perform gradient updates on the actor and critic networks.
  • Target Updates: Soft update the target networks (actor and critic) toward the main networks.
  • Logging & Evaluation: Track performance (e.g., cumulative reward) and optionally evaluate the agent periodically.
Kindly refer to the following custom training loop as an example.
% Create agent
agent = rlDDPGAgent(actor, critic, agentOpts);
% Experience buffer
buffer = agent.ExperienceBuffer;
% Logging
episodeRewards = zeros(maxEpisodes,1);
% Custom Training Loop
for episode = 1:maxEpisodes
% Reset environment and agent
initialObs = reset(env);
agent.reset();
% Track episode reward
totalReward = 0;
for step = 1:maxStepsPerEpisode
% Get action from agent
action = getAction(agent, initialObs);
% Step the environment
[nextObs, reward, isDone, ~] = step(env, action);
% Store experience
experience = rlExperience(initialObs, action, reward, nextObs, isDone);
append(buffer, experience);
% Learn from experience if enough samples available
if buffer.NumExperiences >= agentOpts.MiniBatchSize
learn(agent, buffer);
end
% Update state and reward
initialObs = nextObs;
totalReward = totalReward + reward;
if isDone
break;
end
end
% Log reward
episodeRewards(episode) = totalReward;
fprintf("Episode %d: Total Reward = %.2f\n", episode, totalReward);
% Optional: save best agent
if mod(episode, 50) == 0
save(sprintf('agent_episode_%d.mat', episode), 'agent');
end
end
For more information regarding "DDPG Training Algorithm", kindly refer to the following MATLAB documentation:

카테고리

Help CenterFile Exchange에서 Reinforcement Learning에 대해 자세히 알아보기

제품


릴리스

R2024b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by