How can I store the trained model and call and output the best action value during the test

Question

蔷蔷汪 2022년 1월 9일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1625300-how-can-i-store-the-trained-model-and-call-and-output-the-best-action-value-during-the-test

답변: Ayush Aniket 2023년 9월 27일

How can I store the trained model and call and output the best action value during the test？

ResetFcn=@ myResetFunction;
StepFcn=@ myStepFunction;
%创建环境接口，并从环境中获取观察和动作信息
ObservationInfo = rlNumericSpec([7 1]);
ObservationInfo.Name = 'observation';
ActionInfo = rlNumericSpec([7 1],...
    'LowerLimit',-1,'UpperLimit',1);
ActionInfo.Name = 'Price';
env = rlFunctionEnv(ObservationInfo,ActionInfo,StepFcn,ResetFcn);
Ts = 0.0005;
Tf = 0.02;
%%
%创建评论者网络
rng(0)
statePath = [
    featureInputLayer(7,'Normalization','none','Name','observation')
    fullyConnectedLayer(256,'Name','CriticStateFC1')
    reluLayer('Name', 'CriticRelu1')
    fullyConnectedLayer(128,'Name','CriticStateFC2')];
actionPath = [
    featureInputLayer(7,'Normalization','none','Name','action')
    fullyConnectedLayer(128,'Name','CriticActionFC1','BiasLearnRateFactor',0)];
commonPath = [
    additionLayer(2,'Name','add')
    reluLayer('Name','CriticCommonRelu')
    fullyConnectedLayer(1,'Name','CriticOutput')];
criticNetwork = layerGraph();
criticNetwork = addLayers(criticNetwork,statePath);
criticNetwork = addLayers(criticNetwork,actionPath);
criticNetwork = addLayers(criticNetwork,commonPath);
    
criticNetwork = connectLayers(criticNetwork,'CriticStateFC2','add/in1');
criticNetwork = connectLayers(criticNetwork,'CriticActionFC1','add/in2');
% figure
% plot(criticNetwork)
%设置评论家网络的学习率以及梯度阈值（防止梯度爆炸）
criticOpts = rlRepresentationOptions('LearnRate',0.001,'GradientThreshold',1);
obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);
critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,'Observation',{'observation'},'Action',{'action'},criticOpts);
%%
%创建动作者网络
actorNetwork = [
    featureInputLayer(7,'Normalization','none','Name','observation')
    fullyConnectedLayer(256,'Name','ActorFC1')
    reluLayer('Name','ActorRelu1')
    fullyConnectedLayer(128,'Name','ActorFC2')
    reluLayer('Name','ActorRelu2')
    fullyConnectedLayer(7,'Name','ActorFC3')
    tanhLayer('Name','tanh1')];
actorOpts = rlRepresentationOptions('LearnRate',0.0001,'GradientThreshold',1);
actor = rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo,'Observation',{'observation'},'Action',{'tanh1'},actorOpts);
%%
%创建DDPG智能体
agentOpts = rlDDPGAgentOptions(...
    'SampleTime',0.02,...
    'TargetSmoothFactor',0.001,...%目标网络软更新平滑因子
    'ExperienceBufferLength',1000000,...
    'DiscountFactor',0.99,...
    'MiniBatchSize',64);
agentOpts.NoiseOptions.Variance = 0.002;%添加噪声，保证随机性
agentOpts.NoiseOptions.VarianceDecayRate = 1e-6;
agent = rlDDPGAgent(actor,critic,agentOpts);
%%
trainOpts = rlTrainingOptions;
trainOpts.MaxEpisodes = 3000;
trainOpts.MaxStepsPerEpisode = ceil(Tf/Ts);
trainOpts.StopTrainingCriteria = "AverageReward";
trainOpts.StopTrainingValue = 10000;
trainOpts.ScoreAveragingWindowLength = 2;
trainingOptions.UseParallel = true;
trainingOptions.ParallelizationOptions.Mode = 'async';
trainingOptions.ParallelizationOptions.DataToSendFromWorkers = 'experiences';
trainingOptions.ParallelizationOptions.StepsUntilDataIsSent = 32;
doTraining = true;
if doTraining
    trainingStats = train(agent,env,trainOpts);
end

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Ayush Aniket 2023년 9월 27일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1625300-how-can-i-store-the-trained-model-and-call-and-output-the-best-action-value-during-the-test#answer_1319827

MATLAB Online에서 열기

Hi 蔷蔷汪,

As per my understanding, you want to know about the process of storing a trained RL agent and using the best action learned in training during testing.

You can use the `save` function to store the trained model as shown below:

save('trainedAgent.mat', 'agent');

You can read more about saving candidate agents by referring to the following link:

https://in.mathworks.com/help/reinforcement-learning/ug/train-reinforcement-learning-agents.html#:~:text=%22none%22.-,Save%20Candidate%20Agents,-During%20training%2C%20you

This saves the trained agent to a .mat file. To load the trained agent, you can use the `load` function as:

load('trainedAgent.mat', 'agent');

After loading the trained agent, you can use it to choose the best action given an observation which it already learned during the training period. The `getAction` function of the agent can be used for this:

getAction(agent,{rand(obsInfo.Dimension)});

In your case the observation input will differ as per the environment.

Hope it helps.