Tuning ExperienceHorizon hyperparamter for PPO agent (Reinforcement Learning)

Question

Nicolas CRETIN 2024년 7월 18일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2138486-tuning-experiencehorizon-hyperparamter-for-ppo-agent-reinforcement-learning

댓글: Nicolas CRETIN 2024년 8월 2일

Hello everyone,

I'm trying to train a PPO agent, and I would like to change the value for the ExperienceHorizon hyperparameter (Options for PPO agent - MATLAB - MathWorks Switzerland)

When I try another value than the default, the agent wait for the end of the episode to update its policy. For example, ExperienceHorizon=1024 don't work for me, dispite the episode's lenght of more than 1024 steps. I'm also not using Parallel training.

I also get the same issue if I change the MiniBatchSize from its default value.

Is there anything I've missed about this parameter?

More infos on PPO algorithms: Proximal Policy Optimization (PPO) Agents - MATLAB & Simulink - MathWorks Switzerland

If anyone could help, that would be very nice!

Thanks a lot in advance,

Nicolas

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Alan 2024년 8월 1일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2138486-tuning-experiencehorizon-hyperparamter-for-ppo-agent-reinforcement-learning#answer_1493366

편집: Alan 2024년 8월 1일

Hi Nicolas,

I could not figure out how to record the episode or step index at which the agent’s policy is updated so I could not verify the behaviour of various combinations of options.

From my understanding, I could think of the following possibilities for updating the policy late:

During the training phase, none of the episodes reached 2 ExperienceHorizon’s worth of steps due to reaching termination conditions early. So the policy update might be happening with a combination of steps from different episodes.
The MaxStepsPerEpisode parameter in the training options could be more than ExperienceHorizon, thereby causing the episode to terminate early. By training options I mean the ones that are passed to the train() function via an argument list, or via a rlTrainingOptions object (https://www.mathworks.com/help/releases/R2023b/reinforcement-learning/ref/rl.option.rltrainingoptions.html)
The MiniBatchSize parameter defines the size of the chunks that the experience buffer is divided into before running an epoch of training on the policy network. If ExperienceHorizon is less than MiniBatchSize, it could cause issues. So, ensure that ExperienceHorizon is a multiple of MiniBatchSize.

I hope this helped.

-Alan

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Nicolas CRETIN 2024년 8월 2일

Thank you Alan!

댓글을 달려면 로그인하십시오.

Tuning ExperienceHorizon hyperparamter for PPO agent (Reinforcement Learning)

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Tuning ExperienceHorizon hyperparamter for PPO agent (Reinforcement Learning)

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기