Minibatches construction for PPO agent in parallel syncronous mode

Question

Federico Toso 2024년 3월 6일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2091391-minibatches-construction-for-ppo-agent-in-parallel-syncronous-mode

댓글: Federico Toso 2024년 3월 13일

If I understood correctly the documentation, when a PPO agent is trained in parallel syncronous mode each worker sends its own experiences back to the client, which then assembles the minibatches and then calculates the gradients for the policy update. My question is: at the moment of the construction of minibatches, does the client attempt to sample experiences evenly from all the workers? For example: if I have 4 workers and my minibatch size has been set to 32, does the client sample 8 experiences from each worker, for every single minibatch?

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Avadhoot 2024년 3월 12일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2091391-minibatches-construction-for-ppo-agent-in-parallel-syncronous-mode#answer_1423961

Hi Federico,

From your question I infer that you are concerned with the actual implementation of how the minibatch is constructed from all the updates received from the workers. During parallel synchronous training, the client collects samples from all the workers but the exact method of that is dependent on various factors like the algorithm, the settings used during the training and other specified options. The documentation does not provide implementation-level details into the process.

What you have described would be the ideal behaviour for the client, to sample equal number of experiences from each worker. But this is not always practical. Some prominent factors to consider include the following:

Experience Variability: Not all workers may generate the same number of experiences due to the variability in episodes. Some episodes might end quicker than others based on the policy's actions and the environment dynamics.
Sampling Strategy: The actual implementation of the minibatch sampling strategy might not enforce strict equality in the number of experiences sampled from each worker.

If you require the ideal behaviour for your application, you can simulate it manually through custom training loop. But this might not always be a good choice as the ideal behaviour is not practical in many cases.

I hope this helps.

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Federico Toso 2024년 3월 13일

Thank you for the answer. I understand your points, since my application indeed has a variable number of experiences per episode.

If I use the default sampling strategy for my PPO agent, and the number of experiences for each worker turns out to be different for the latest episodes (suppose I'm using syncronous mode), does the program at least TRY to use samples from all the workers when it assembles minibatches? Alternatively, does it favour sampling from a single worker for this purpose?

Knowing this would help me decide if it's worth writing a custom training loop for my application

댓글을 달려면 로그인하십시오.

Minibatches construction for PPO agent in parallel syncronous mode

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Minibatches construction for PPO agent in parallel syncronous mode

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기