Advantage normalization for PPO Agent
이전 댓글 표시
When dealing with PPO Agents, it is possibile to set a "NormalizedAdvantageMethod" to normalize the advantage function values for each mini-batch of experiences. The default value is "none".
While I can intuitively grasp that such a normalization operation may be beneficial in terms of reducing variance, I could not find any reference online which describes with sufficient details when and why this procedure should be useful. My questions are:
1) Under which circumstances does the normalization of advantage function values turn out to be practically beneficial?
2) If I decide to normalize the advantage function values, are there situations where the "moving" option (which uses a restrict number of samples) can be more beneficial then the "current" option (which uses all of the current samples available)? Intuitively I would say that the "current" option should always perform better
채택된 답변
추가 답변 (0개)
카테고리
도움말 센터 및 File Exchange에서 Deep Learning Toolbox에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!