What's the difference between getAction and predict in RL and why does it change with agent and actor?

Question

Kevin Voogd 2022년 7월 20일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1764215-what-s-the-difference-between-getaction-and-predict-in-rl-and-why-does-it-change-with-agent-and-acto

답변: Ari Biswas 2023년 1월 26일

Hi all,

I am trying to import the neural network of my PPO actor via ONNX. I followed the steps shown in here Train DDPG Agent with Pretrained Actor Network (adapted to PPO, though). I do not import a critic for the network because my network is ready to be deployed. When I check the output of predict(....) it matches what I've in Python. However, getAction(agent,{testData}) and getAction(actor,{testData}) differ from predict(...) and even from themselves. Moreover, they change every run even if the input is kept constant (for example, feeding an array of ones). Can someone clarify me why the output of getAction changes when used with agent abd actor, and why does not match the value of the neural network?

Best regards,

Kevin

Here is the code used and a result obtained:

agentAction = -0.9091

actorAction = -0.8572

predictNN = 0.8436

actorNetwork  = importONNXNetwork("C:\...\ppo_model.onnx",'TargetNetwork',"dlnetwork", "InputDataFormats",'BC');
actorNetwork = layerGraph(actorNetwork);
low_limit = transpose([0.0 -pi -20000.0, -20000.0, -1.5, -20000, -20000, -2, -3, -3.5, -4]);
upper_limit = transpose([20.0, pi, 20000.0, 20000.0, 1.5, 20000, 20000, 2, 3, 3.5, 4]);
obsInfo = rlNumericSpec([11 1], 'LowerLimit',low_limit, 'UpperLimit',upper_limit);
actInfo = rlNumericSpec([1 1],'LowerLimit',-0.18,'UpperLimit',0.18);
% Code generation does not support the last custom layer, so delete it
actorNetwork = removeLayers(actorNetwork, 'onnx__Gemm_0_BatchSizeVerifier');
actorNetwork = removeLayers(actorNetwork, 'x25Output');
actorNetwork = removeLayers(actorNetwork, 'x26Output');
actorNetwork = connectLayers(actorNetwork, 'onnx__Gemm_0', 'Gemm_0');
% Get the names of the layers required to generate the actor
netMeanActName = actorNetwork.Layers(12).Name;
netStdActName = actorNetwork.Layers(13).Name;
netObsNames = actorNetwork.Layers(1).Name;
actor = rlContinuousGaussianActor(actorNetwork,obsInfo,actInfo,'ActionMeanOutputNames', netMeanActName, 'ActionStandardDeviationOutputNames', netStdActName, 'ObservationInputNames', netObsNames);
agent = rlPPOAgent(obsInfo, actInfo);
agent.setActor(actor)
% Check that the network used by supervisedActor is the same one that was loaded. To do so, evaluate both the network and the agent using the same random input observation.
testData = ones(11,1);
% Evaluate the actor
agentAction = getAction(agent,{testData})
actorAction = getAction(actor,{testData})
% Evaluate the agent's actor 
predictImNN = predict(getModel(getActor(agent)),dlarray(testData','BC'))

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Ari Biswas 2023년 1월 26일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1764215-what-s-the-difference-between-getaction-and-predict-in-rl-and-why-does-it-change-with-agent-and-acto#answer_1157175

The PPO agent with continuous action space has a stochastic policy. The network has two outputs: mean and standard deviation.

Calling getAction on the agent/actor returns the action sampled from the policy using the mean and stdev outputs of the network.

Calling predict on the network gives you the mean and std values. You should do [mean,std] = predict(...) instead to get both values.

Also, you must ensure that you are comparing from the same random number generator state. For e.g. ensure that you execute rng(0) before you evaluating the networks each time.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

What's the difference between getAction and predict in RL and why does it change with agent and actor?

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

What's the difference between getAction and predict in RL and why does it change with agent and actor?

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기