How to get the value of value function in soft actor critic?

Question

ryunosuke tazawa 2021년 10월 20일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1567868-how-to-get-the-value-of-value-function-in-soft-actor-critic

댓글: ryunosuke tazawa 2021년 11월 19일

MATLAB Online에서 열기

I want to know the way to get the value of value function.

I am using soft actor critic.

Someone tell me the way?

%  Soft-actor-critic
clear all;
close all;
Length = 1;                              
Mass = 1;                                 
Ts = 0.01;                                 
Theta_Initial = -pi;                       
AngularVelocity_Initial = 0;              
SimplePendulum = classPendulum(Length, Mass, Theta_Initial, AngularVelocity_Initial, Ts);
ObservationInfo = rlNumericSpec([2 1]);
ObservationInfo.Name = 'States';
ObservationInfo.Description = 'Theta, AngularVelocity';
ActionInfo = rlNumericSpec([1 1],'LowerLimit',-100,'UpperLimit',-5);
ActionInfo.Name = 'Action';
ActionInfo.Description = 'F';
ResetHandle = @()myResetFunction(SimplePendulum);
StepHandle = @(Action,LoggedSignals) myStepfunction(Action,LoggedSignals,SimplePendulum);
env = rlFunctionEnv(ObservationInfo, ActionInfo, StepHandle, ResetHandle);
obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);
numObs  = obsInfo.Dimension(1);
numAct  = numel(actInfo);
device = 'gpu';
% CRITIC
statePath1 = [
    featureInputLayer(numObs,'Normalization','none','Name','observation')
    fullyConnectedLayer(400,'Name','CriticStateFC1')
    reluLayer('Name','CriticStateRelu1')
    fullyConnectedLayer(300,'Name','CriticStateFC2')
    ];
actionPath1 = [
    featureInputLayer(numAct,'Normalization','none','Name','action')
    fullyConnectedLayer(300,'Name','CriticActionFC1')
    ];
commonPath1 = [
    additionLayer(2,'Name','add')
    reluLayer('Name','CriticCommonRelu1')
    fullyConnectedLayer(1,'Name','CriticOutput')
    ];
criticNet = layerGraph(statePath1);
criticNet = addLayers(criticNet,actionPath1);
criticNet = addLayers(criticNet,commonPath1);
criticNet = connectLayers(criticNet,'CriticStateFC2','add/in1');
criticNet = connectLayers(criticNet,'CriticActionFC1','add/in2');
criticOptions = rlRepresentationOptions('Optimizer','adam','LearnRate',1e-3,... 
                                        'GradientThreshold',1,'L2RegularizationFactor',2e-4,'UseDevice',device);
critic1 = rlQValueRepresentation(criticNet,obsInfo,actInfo,...
    'Observation',{'observation'},'Action',{'action'},criticOptions);
critic2 = rlQValueRepresentation(criticNet,obsInfo,actInfo,...
    'Observation',{'observation'},'Action',{'action'},criticOptions);
%ACTOR
statePath = [
    featureInputLayer(numObs,'Normalization','none','Name','observation')
    fullyConnectedLayer(400, 'Name','commonFC1')
    reluLayer('Name','CommonRelu')];
meanPath = [
    fullyConnectedLayer(300,'Name','MeanFC1')
    reluLayer('Name','MeanRelu')
    fullyConnectedLayer(numAct,'Name','Mean')
    ];
stdPath = [
    fullyConnectedLayer(300,'Name','StdFC1')
    reluLayer('Name','StdRelu')
    fullyConnectedLayer(numAct,'Name','StdFC2')
    softplusLayer('Name','StandardDeviation')];
concatPath = concatenationLayer(1,2,'Name','GaussianParameters');
actorNetwork = layerGraph(statePath);
actorNetwork = addLayers(actorNetwork,meanPath);
actorNetwork = addLayers(actorNetwork,stdPath);
actorNetwork = addLayers(actorNetwork,concatPath);
actorNetwork = connectLayers(actorNetwork,'CommonRelu','MeanFC1/in');
actorNetwork = connectLayers(actorNetwork,'CommonRelu','StdFC1/in');
actorNetwork = connectLayers(actorNetwork,'Mean','GaussianParameters/in1');
actorNetwork = connectLayers(actorNetwork,'StandardDeviation','GaussianParameters/in2');
actorOptions = rlRepresentationOptions('Optimizer','adam','LearnRate',1e-3,...
                                       'GradientThreshold',1,'L2RegularizationFactor',1e-5,'UseDevice',device);
actor = rlStochasticActorRepresentation(actorNetwork,obsInfo,actInfo,actorOptions,...
    'Observation',{'observation'});
agentOptions = rlSACAgentOptions;
agentOptions.SampleTime = Ts;
agentOptions.DiscountFactor = 0.99;
agentOptions.TargetSmoothFactor = 1e-3;
agentOptions.ExperienceBufferLength = 1e6;
agentOptions.MiniBatchSize = 32;
agent = rlSACAgent(actor,[critic1 critic2],agentOptions);
getAction(agent,{rand(obsInfo(1).Dimension)});
maxepisodes = 10;
maxsteps = 2;
trainingOptions = rlTrainingOptions(...
    'MaxEpisodes',maxepisodes,...
    'MaxStepsPerEpisode',maxsteps,...
    'StopOnError','on',...
    'Verbose',true,...
    'Plots','training-progress',...
    'StopTrainingCriteria','AverageReward',...
    'StopTrainingValue',Inf,...
    'ScoreAveragingWindowLength',10); 
trainingStats = train(agent,env,trainingOptions);
% Play the game with the trained agent
simOptions = rlSimulationOptions('MaxSteps',maxsteps);
experience = sim(env,agent,simOptions);
% Q値   Here I want to get the value of value of function,(Qvalue) 
% Is the way correct?
batchobs = rand(2,1,64);
batchact = rand(1,1,64,1);
qvalue = getValue(critic2,{batchobs},{batchact});
%v = getValue(critic2,{rand(2,1)},{rand(1,1)})
%save("kyori30Agent.mat","States")

댓글 수: 2
없음 표시없음 숨기기

Martin Forsberg Lie 2021년 11월 8일

편집: Martin Forsberg Lie 2021년 11월 8일

MATLAB Online에서 열기

SAC is implemented with two critics, and you must choose the critic:

critic = getCritic(agent);
value = getValue(critic(1),{obs},action);

ryunosuke tazawa 2021년 11월 19일

'The function or variable'agent' is not recognized.'

critic = getCritic(agent);

value = getValue(critic(1),{obs},action);

I added these, but I got the above error.

Do you know how to fix it?

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

How to get the value of value function in soft actor critic?

댓글 수: 2
없음 표시없음 숨기기

답변 (0개)

참고 항목

카테고리

태그

Community Treasure Hunt

How to get the value of value function in soft actor critic?

댓글 수: 2 없음 표시없음 숨기기

답변 (0개)

참고 항목

카테고리

태그

Community Treasure Hunt

댓글 수: 2
없음 표시없음 숨기기