Resume training for PPO agent

Question

Harry Dunn 2023년 4월 8일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1943714-resume-training-for-ppo-agent

답변: Emmanouil Tzorakoleftherakis 2023년 4월 10일

I am trying to run a PPO agent where the environment is essentially a text file read-in containing data obtained from a robotics dynamics simulator (Webots). This works but there are random spikes in CPU which causes it to crash becuase both the robotics simulator and MATLAB have to be running simulatneously (although it will typicallly do a few thousand episodes at least before it crashes).

I have used the following link to save the agent after every episode and then I reload the agent and re-run: https://uk.mathworks.com/matlabcentral/answers/495436-how-to-train-further-a-previously-trained-agent

use_previous_agent=true;
if use_previous_agent
    % Load experiences from pre-trained agent       
    load("Filepath...",'saved_agent');
    agent = saved_agent;
else
    % Create a new agent
    agent = rlPPOAgent(actor,critic,agentOpts);
    agent.AgentOptions.CriticOptimizerOptions.LearnRate = 3e-3;
    agent.AgentOptions.ActorOptimizerOptions.LearnRate = 3e-3;
    
end
trainOpts = rlTrainingOptions(...
    MaxEpisodes=100000,...
    MaxStepsPerEpisode=600000,...
    Plots="training-progress",...
    StopTrainingCriteria="AverageReward",...
    StopTrainingValue=4300,...
    ScoreAveragingWindowLength=100, ...
    SaveAgentCriteria="EpisodeCount", ...
    SaveAgentValue=10, ...
    SaveAgentDirectory = pwd + "\run1\Agents");
trainingStats = train(agent, env, trainOpts);

I'm not sure if this is correct because the above link talks about specifically for DDPG where you have to reset the experience buffer etc. I was wondering if anyone with experience with PPO agents would know if this is a viable process?

Thanks in advance

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Emmanouil Tzorakoleftherakis 2023년 4월 10일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1943714-resume-training-for-ppo-agent#answer_1212979

PPO does not use an experience buffer so you should be fine loading the saved agent to resume training. If you are using advantage normalization though, previous information won't transfer over to the new training session.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Resume training for PPO agent

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Resume training for PPO agent

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기