PPO Agent - Initialization of actor and critic newtorks

Question

Federico Toso 2024년 3월 17일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2095416-ppo-agent-initialization-of-actor-and-critic-newtorks

댓글: Federico Toso 2024년 3월 20일

Whenever a PPO agent is initialized in Matlab, according to the documentation the parameters of both the actor and the critic are set randomly. However I know that this is not the only possible choice: other initialization schemes are possible (e.g. orthogonal initialization), and this can sometimes improve the future performance of the agent.

Is there a reason why the random initialization has been chosen as the default method here?
Is it possible to specify a different initialization method easily in the context of Reinforcement learning Toolbox, without starting from scratch?

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Venu 2024년 3월 19일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2095416-ppo-agent-initialization-of-actor-and-critic-newtorks#answer_1427471

Hi @Federico Toso

Random initialization can encourage initial exploration by starting the policy and value functions in a non-deterministic state.

It doesn't require specific tuning or assumptions about the model architecture, making it a good default choice.

MATLAB's Reinforcement Learning Toolbox does not directly expose an interface to specify the initialization method for the neural networks (actor and critic) within the PPO agent or other agents directly through high-level functions or options.

So as you have mentioned regarding starting from scratch, when you create the neural networks for the actor and critic using MATLAB's Deep Learning Toolbox (e.g., using layerGraph, dlnetwork, or similar functions), you can specify the initialization for each layer manually. After defining the networks with your desired initialization, you can then pass them to the PPO agent creation function.

Here is a page comparing 3 initializers when training LSTMs:

https://www.mathworks.com/help/deeplearning/ug/compare-layer-weight-initializers-example.html

Hope this helps to an extent!

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Federico Toso 2024년 3월 20일

OK thank you very much!

댓글을 달려면 로그인하십시오.

PPO Agent - Initialization of actor and critic newtorks

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

PPO Agent - Initialization of actor and critic newtorks

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기