PPO Agent training - Is it possible to control the number of epochs dynamically?

Question

Federico Toso 2024년 3월 17일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2095436-ppo-agent-training-is-it-possible-to-control-the-number-of-epochs-dynamically

댓글: Federico Toso 2024년 3월 25일

In the deault implementation of PPO agent in Matlab, the number of epochs is a static property that must be selected before the training starts.

However I've seen that state-of-the-art implentations of PPO sometimes select dynamically the number of epochs: basically, for each learning phase, the algorithm decides whether to execute a new epoch or not, basing on the value of the KL divergence just calculated. This seems to help the robustness of the algorithm significanlty.

Is it possible for a user to implement such a routine in Matlab in the context of PPO training, possibly applying some slight modifications to the default process?

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Kartik Saxena 2024년 3월 22일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2095436-ppo-agent-training-is-it-possible-to-control-the-number-of-epochs-dynamically#answer_1429461

MATLAB Online에서 열기

Hi,

Given below is the code snippet depicting the logic/pseudo algorithm you can refer to for this purpose:

% Assume env is your environment and agent is your PPO agent
for episode = 1:maxEpisodes
    experiences = collectExperiences(env, agent);
    klDivergence = inf;
    epochCount = 0;
    
    while klDivergence > klThreshold && epochCount < maxEpochs
        oldPolicy = getPolicy(agent);
        agent = updateAgent(agent, experiences);
        newPolicy = getPolicy(agent);
        
        klDivergence = calculateKLDivergence(oldPolicy, newPolicy);
        epochCount = epochCount + 1;
    end
end

Additionally, you can refer to the following documentations and examples to get an idea and use it for your custom implementation of PPO agent:

https://www.mathworks.com/help/reinforcement-learning/ref/rl.env.rlfunctionenv.html

https://www.mathworks.com/help/reinforcement-learning/ug/train-reinforcement-learning-policy-using-custom-training.html

I hope it helps!

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Federico Toso 2024년 3월 25일

Thank you!

댓글을 달려면 로그인하십시오.

PPO Agent training - Is it possible to control the number of epochs dynamically?

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

PPO Agent training - Is it possible to control the number of epochs dynamically?

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기