What's the difference between getAction and predict in RL and why does it change with agent and actor?
조회 수: 1 (최근 30일)
이전 댓글 표시
Hi all,
I am trying to import the neural network of my PPO actor via ONNX. I followed the steps shown in here Train DDPG Agent with Pretrained Actor Network (adapted to PPO, though). I do not import a critic for the network because my network is ready to be deployed. When I check the output of predict(....) it matches what I've in Python. However, getAction(agent,{testData}) and getAction(actor,{testData}) differ from predict(...) and even from themselves. Moreover, they change every run even if the input is kept constant (for example, feeding an array of ones). Can someone clarify me why the output of getAction changes when used with agent abd actor, and why does not match the value of the neural network?
Best regards,
Kevin
Here is the code used and a result obtained:
agentAction = -0.9091
actorAction = -0.8572
predictNN = 0.8436
actorNetwork = importONNXNetwork("C:\...\ppo_model.onnx",'TargetNetwork',"dlnetwork", "InputDataFormats",'BC');
actorNetwork = layerGraph(actorNetwork);
low_limit = transpose([0.0 -pi -20000.0, -20000.0, -1.5, -20000, -20000, -2, -3, -3.5, -4]);
upper_limit = transpose([20.0, pi, 20000.0, 20000.0, 1.5, 20000, 20000, 2, 3, 3.5, 4]);
obsInfo = rlNumericSpec([11 1], 'LowerLimit',low_limit, 'UpperLimit',upper_limit);
actInfo = rlNumericSpec([1 1],'LowerLimit',-0.18,'UpperLimit',0.18);
% Code generation does not support the last custom layer, so delete it
actorNetwork = removeLayers(actorNetwork, 'onnx__Gemm_0_BatchSizeVerifier');
actorNetwork = removeLayers(actorNetwork, 'x25Output');
actorNetwork = removeLayers(actorNetwork, 'x26Output');
actorNetwork = connectLayers(actorNetwork, 'onnx__Gemm_0', 'Gemm_0');
% Get the names of the layers required to generate the actor
netMeanActName = actorNetwork.Layers(12).Name;
netStdActName = actorNetwork.Layers(13).Name;
netObsNames = actorNetwork.Layers(1).Name;
actor = rlContinuousGaussianActor(actorNetwork,obsInfo,actInfo,'ActionMeanOutputNames', netMeanActName, 'ActionStandardDeviationOutputNames', netStdActName, 'ObservationInputNames', netObsNames);
agent = rlPPOAgent(obsInfo, actInfo);
agent.setActor(actor)
% Check that the network used by supervisedActor is the same one that was loaded. To do so, evaluate both the network and the agent using the same random input observation.
testData = ones(11,1);
% Evaluate the actor
agentAction = getAction(agent,{testData})
actorAction = getAction(actor,{testData})
% Evaluate the agent's actor
predictImNN = predict(getModel(getActor(agent)),dlarray(testData','BC'))
댓글 수: 0
채택된 답변
Ari Biswas
2023년 1월 26일
The PPO agent with continuous action space has a stochastic policy. The network has two outputs: mean and standard deviation.
Calling getAction on the agent/actor returns the action sampled from the policy using the mean and stdev outputs of the network.
Calling predict on the network gives you the mean and std values. You should do [mean,std] = predict(...) instead to get both values.
Also, you must ensure that you are comparing from the same random number generator state. For e.g. ensure that you execute rng(0) before you evaluating the networks each time.
댓글 수: 0
추가 답변 (0개)
참고 항목
카테고리
Help Center 및 File Exchange에서 Deep Learning Toolbox에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!