Output between generated Policy and trained Agent different
조회 수: 1 (최근 30일)
이전 댓글 표시
Dear Mathworks Team,
I have trained a DDPG-Agent which recieves 2 observations.
By using the approach described in:
i generated a function evaluatePolicy.m which accepts an input of shape (2,1,1) and outputs a scalar. However the output differs from that of my Agent during training.
During the training, the following lines define certain action-properties in the definition of the Environment and Training (createSineAgent.m) process (not in the neural-Net definition of the Agent (createDDGPNetworks.m).
numAct = 1;
actionInfo = rlNumericSpec([numAct 1],'LowerLimit',0 ,'UpperLimit', 1);
actionInfo.Name = 'sine_amplitude';
This prevents that agent outputs bigger than 1 and smaller than 0 are applied. The output during training is always between 1 and 0 and clipped at those values.
However, the output of the corresponding evaluatePolicy.m seems to range between -1 and 1 and not 0 and 1. Why is that?
Examples:
>> evaluatePolicy(reshape([-0.1515581,-0.1515581],2,1,1))
ans = 0.9986
>> evaluatePolicy(reshape([-0.1515581,-0.6],2,1,1))
ans = -1
>> evaluatePolicy(reshape([-0.1515581,100],2,1,1))
ans = -1
>> evaluatePolicy(reshape([-0.1515581,-100],2,1,1))
ans = -1
I was expecting the output to be between 0 and 1 as defined in
numAct = 1;
actionInfo = rlNumericSpec([numAct 1],'LowerLimit',0 ,'UpperLimit', 1);
actionInfo.Name = 'sine_amplitude';
.
Does the approach described in:
not consider the ActionInfo ?
The output for
type evaluatePolicy.m
returns
>> type evaluatePolicy.m
function action1 = evaluatePolicy(observation1)
%#codegen
% Reinforcement Learning Toolbox
% Generated on: 22-Sep-2021 19:49:51
action1 = localEvaluate(observation1);
end
%% Local Functions
function action1 = localEvaluate(observation1)
persistent policy
if isempty(policy)
policy = coder.loadDeepLearningNetwork('agentData.mat','policy');
end
action1 = predict(policy, observation1);
while it states in
that the output should be something more similar to
function action1 = evaluatePolicy(observation1)
%#codegen
% Reinforcement Learning Toolbox
% Generated on: 23-Feb-2021 18:52:32
actionSet = [-10 10];
% Select action from sampled probabilities
probabilities = localEvaluate(observation1);
% Normalize the probabilities
p = probabilities(:)'/sum(probabilities);
% Determine which action to take
edges = min([0 cumsum(p)],1);
edges(end) = 1;
[~,actionIndex] = histc(rand(1,1),edges); %#ok<HISTC>
action1 = actionSet(actionIndex);
end
%% Local Functions
function probabilities = localEvaluate(observation1)
persistent policy
if isempty(policy)
policy = coder.loadDeepLearningNetwork('agentData.mat','policy');
end
observation1 = observation1(:)';
probabilities = predict(policy, observation1);
end
.
In this output i can see a parameter
actionSet = [-10 10];
which considers the action boundaries as it seems.
In my example this is missing.
댓글 수: 0
답변 (0개)
참고 항목
카테고리
Help Center 및 File Exchange에서 Deep Learning Toolbox에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!