Creating an actorLossFunction for ContinuousDeterministicActor

Question

rtn 2022년 5월 24일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1726415-creating-an-actorlossfunction-for-continuousdeterministicactor

답변: Takeshi Takahashi 2022년 6월 2일

Hi in the example the actor loss function is the following for a rlDiscreteCategoricalActor

function loss = actorLossFunction(policy, lossData)
    policy = policy{1};
    % Create the action indication matrix.
    batchSize = lossData.batchSize;
    Z = repmat(lossData.actInfo.Elements',1,batchSize);
    actionIndicationMatrix = lossData.actionBatch(:,:) == Z;
    
    % Resize the discounted return to the size of policy.
    G = actionIndicationMatrix .* lossData.discountedReturn;
    G = reshape(G,size(policy));
    
    % Round any policy values less than eps to eps.
    policy(policy < eps) = eps;
    
    % Compute the loss.
    loss = -sum(G .* log(policy),'all');
end

Here is my

actInfo =

rlNumericSpec with properties:

LowerLimit: [2×1 double]

UpperLimit: [2×1 double]

Name: "CartPole Action"

Description: [0×0 string]

Dimension: [2 1]

DataType: "double"

obsInfo =

rlNumericSpec with properties:

LowerLimit: -Inf

UpperLimit: Inf

Name: "CartPole States"

Description: "pendulum_force, cart position, cart velocity"

Dimension: [4 1501]

DataType: "double"

Here is how I set my actor

actor = rlContinuousDeterministicActor(actorNet,obsInfo,actInfo);
actor = accelerate(actor,true);
actorOpts = rlOptimizerOptions('LearnRate',1e-3);
actorOptimizer = rlOptimizer(actorOpts);

To create my loss function can I do the following?

function loss = actorLossFunction(policy, lossData)
    policy = policy{1};
    % Create the action indication matrix.
    batchSize = lossData.batchSize;
    Z = repmat(lossData.actInfo.Dimension(1)',1,batchSize);
    actionIndicationMatrix = lossData.actionBatch(:,:) == Z;
    
    % Resize the discounted return to the size of policy.
    G = actionIndicationMatrix .* lossData.discountedReturn;
    G = reshape(G,size(policy));
    
    % Round any policy values less than eps to eps.
    policy(policy < eps) = eps;
    
    % Compute the loss.
    loss = -sum(G .* log(policy),'all');
    
end

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Takeshi Takahashi 2022년 6월 2일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1726415-creating-an-actorlossfunction-for-continuousdeterministicactor#answer_976830

Please take a look at this example for rlContinuousDeterministicActor if you want to use it in a custom training loop.

rlDiscreteCategoricalActor is for stochastic discrete actions while rlContinuousDeterministicActor is for deterministic continuous actions. You need different formulations.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Creating an actorLossFunction for ContinuousDeterministicActor

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Creating an actorLossFunction for Continuous​Determinis​ticActor

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Creating an actorLossFunction for ContinuousDeterministicActor

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기