PPO algorithm training problem in Reinforcement Learning Toolbox
이전 댓글 표시

In the PPO training algorithm , here mentioned “For each experience sequence that does not contain a terminal state, N is equal to the ExperienceHorizon option value. Otherwise, N is less than ExperienceHorizon and SN is the terminal state.” ,
Here's my question :When N is smaller than ExperienceHorizon and N is also smaller than the size of mini-batch data, and this continues for multiple consecutive episodes, When does the algorithm update the parameters in this case?
AND another one question is :When will the PPO parameter be updated under the following parameter Settings:
agentOpts = rlPPOAgentOptions(...
'ExperienceHorizon',10000,...
'MiniBatchSize',64,...
'NumEpoch',3,...)
trainOpts = rlTrainingOptions(...
'MaxEpisodes',10000,...
'MaxStepsPerEpisode',30,... )
채택된 답변
추가 답변 (0개)
카테고리
도움말 센터 및 File Exchange에서 Reinforcement Learning에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!