PPO minibatch size for parallel training with variable number of steps

Question

Federico Toso 2024년 2월 23일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2085958-ppo-minibatch-size-for-parallel-training-with-variable-number-of-steps

답변: Emmanouil Tzorakoleftherakis 2024년 2월 26일

I'm training a PPO Agent in sync parallelization mode.

Because of the nature of my environment, the number of steps is not the same for each episode, but can vary (sometimes wildly). Quoting from the reference for PPO Agent Options:

"When the agent is trained in parallel, ExperienceHorizon is ignored, and the whole episode is used to compute the gradients"

I don't fully understand the way in which the experiences collected during the episodes are divided into minibatches of the selected size, before the learning phase begins. Specifically, suppose that

I have 2 parallel syncronous workers: the first one collects 30 experiences, the second one collects 70 experiences for a specifice couple of episodes
The set of my minibatches has been set to 32

How are the experiences divided in minibatches?

As I understand it:

Each worker sends its own experiences to the client --> So the client gaters 30 + 70 = 100 experiences
These 100 experiences are divided into three groups of 32 (= minibatch size) each; 4 experiences are discarded, since 3 x 32 = 96

Is my reasoning correct? If so, I guess that if I want to limit the number of discarded experiences, the best way would be to decrease the minibatch size as much as possible (since I cannot know in advance the total number of experience available in every iteration)

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Emmanouil Tzorakoleftherakis 2024년 2월 26일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2085958-ppo-minibatch-size-for-parallel-training-with-variable-number-of-steps#answer_1416843

No data will be discarded actually. As of R2023b, the 4 experiences that are left in your example form their own minibatch and are used that way. Note that this behavior may change in the future.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

PPO minibatch size for parallel training with variable number of steps

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

PPO minibatch size for parallel training with variable number of steps

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기