In TrainMBPOAgentToBalanceCartPoleSystemExample/ cartPoleRewardFunction ,(nextObs)is what?
이전 댓글 표시
function reward = cartPoleRewardFunction(obs,action,nextObs)
% Compute reward value based on the next observation.
if iscell(nextObs)
nextObs = nextObs{1};
end
% Distance at which to fail the episode
xThreshold = 2.4;
% Reward each time step the cart-pole is balanced
rewardForNotFalling = 1;
% Penalty when the cart-pole fails to balance
penaltyForFalling = -50;
x = nextObs(1,:);
distReward = 1 - abs(x)/xThreshold;
isDone = cartPoleIsDoneFunction(obs,action,nextObs);
reward = zeros(size(isDone));
reward(logical(isDone)) = penaltyForFalling;
reward(~logical(isDone)) = ...
0.5 * rewardForNotFalling + 0.5 * distReward(~logical(isDone));
end
I really want to know where nextObs is passing this function in from? Why can't I find this variable in the main function.
If my environment is built from Simulink, how do I get the nextObs variable?
채택된 답변
추가 답변 (0개)
카테고리
도움말 센터 및 File Exchange에서 Training and Simulation에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!

