Reinforcement Learning - How to use a 'trained policy' as a 'controller' block in SIMULINK

Question

J 2019년 10월 17일

3
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/485994-reinforcement-learning-how-to-use-a-trained-policy-as-a-controller-block-in-simulink

댓글: H. M. 2022년 10월 20일

Hello,

I am folowing the example "Train DDPG Agent to Swing Up and Balance Cart-Pole System" shown here. I am trying to use the 'trained' policy as a controller in SIMULINK for the Cartpole Pendulum system but I am having issues. I used the "generatePolicyFunction" function in Matlab to create the evaluatePolicy.m file, which contains the policy function, and the agentData.mat file, which contains the trained deep neural network actor. I am a bit confused on how to use the generated policy function in SIMULINK to replace the RL Agent block and have it as the controller (policy) for the plant. Below is a screenshot of my Simulink model.

Here is the main code following the Matlab example mentioned above:

mdl = 'rlCartPoleSimscapeModel';
open_system(mdl)
env = rlPredefinedEnv('CartPoleSimscapeModel-Continuous');
actInfo = getActionInfo(env);
obsInfo = getObservationInfo(env);
numObservations = obsInfo.Dimension(1);
Ts = 0.02;
Tf = 25;
rng(0)
statePath = [
    imageInputLayer([numObservations 1 1],'Normalization','none','Name','observation')
    fullyConnectedLayer(128,'Name','CriticStateFC1')
    reluLayer('Name','CriticRelu1')
    fullyConnectedLayer(200,'Name','CriticStateFC2')];
actionPath = [
    imageInputLayer([1 1 1],'Normalization','none','Name','action')
    fullyConnectedLayer(200,'Name','CriticActionFC1','BiasLearnRateFactor',0)];
commonPath = [
    additionLayer(2,'Name','add')
    reluLayer('Name','CriticCommonRelu')
    fullyConnectedLayer(1,'Name','CriticOutput')];
criticNetwork = layerGraph(statePath);
criticNetwork = addLayers(criticNetwork,actionPath);
criticNetwork = addLayers(criticNetwork,commonPath);
    
criticNetwork = connectLayers(criticNetwork,'CriticStateFC2','add/in1');
criticNetwork = connectLayers(criticNetwork,'CriticActionFC1','add/in2');
figure
plot(criticNetwork)
criticOptions = rlRepresentationOptions('LearnRate',1e-03,'GradientThreshold',1);
critic = rlRepresentation(criticNetwork,obsInfo,actInfo,...
    'Observation',{'observation'},'Action',{'action'},criticOptions);
actorNetwork = [
    imageInputLayer([numObservations 1 1],'Normalization','none','Name','observation')
    fullyConnectedLayer(128,'Name','ActorFC1')
    reluLayer('Name','ActorRelu1')
    fullyConnectedLayer(200,'Name','ActorFC2')
    reluLayer('Name','ActorRelu2')
    fullyConnectedLayer(1,'Name','ActorFC3')
    tanhLayer('Name','ActorTanh1')
    scalingLayer('Name','ActorScaling','Scale',max(actInfo.UpperLimit))];
actorOptions = rlRepresentationOptions('LearnRate',5e-04,'GradientThreshold',1);
actor = rlRepresentation(actorNetwork,obsInfo,actInfo,...
    'Observation',{'observation'},'Action',{'ActorScaling'},actorOptions);
agentOptions = rlDDPGAgentOptions(...
    'SampleTime',Ts,...
    'TargetSmoothFactor',1e-3,...
    'ExperienceBufferLength',1e6,...
    'MiniBatchSize',128);
agentOptions.NoiseOptions.Variance = 0.4;
agentOptions.NoiseOptions.VarianceDecayRate = 1e-5;
agent = rlDDPGAgent(actor,critic,agentOptions);
maxepisodes = 2000;
maxsteps = ceil(Tf/Ts);
trainingOptions = rlTrainingOptions(...
    'MaxEpisodes',maxepisodes,...
    'MaxStepsPerEpisode',maxsteps,...
    'ScoreAveragingWindowLength',5,...
    'Verbose',false,...
    'Plots','training-progress',...
    'StopTrainingCriteria','AverageReward',...
    'StopTrainingValue',-400,...
    'SaveAgentCriteria','EpisodeReward',...
    'SaveAgentValue',-400);
doTraining = false;
if doTraining    
    % Train the agent.
    trainingStats = train(agent,env,trainingOptions);
else
    % Load pretrained agent for the example.
    load('SimscapeCartPoleDDPG.mat','agent')
end
simOptions = rlSimulationOptions('MaxSteps',500);
experience = sim(env,agent,simOptions);
% bdclose(mdl)
%% 
generatePolicyFunction(agent)

Here is my generated policy function code:

function action1 = evaluatePolicy(observation1)
%#codegen
% Reinforcement Learning Toolbox
% Generated on: 17-Oct-2019 13:34:45
action1 = localEvaluate(observation1);
end
%% Local Functions
function action1 = localEvaluate(observation1)
persistent policy
if isempty(policy)
	policy = coder.loadDeepLearningNetwork('agentData.mat','policy');
end
action1 = predict(policy,observation1);
end

The error I get is the following:

Simulink does not have enough information to determine output sizes for this block. If you think the errors below are inaccurate, try specifying types for the block inputs and/or sizes for the block outputs.

Undefined function or variable 'dltargets'. P-code function 'DeepLearningNetwork.p' produced an error.

Function call failed. Function 'loadDeepLearningNetwork.m' (#95.3569.3643), line 100, column 15: "coder.DeepLearningNetwork(coder.const(matfile), coder.const(''), param{:})" Launch diagnostic report.

Function call failed. Function 'MATLAB Function' (#173.295.353), line 13, column 14: "coder.loadDeepLearningNetwork('agentData.mat','agentData')" Launch diagnostic report.

Persistent variable 'agentData' must be assigned before it is used. The only exception is a check using 'isempty(agentData)' that can be performed prior to assignment. Function 'MATLAB Function' (#173.377.386), line 15, column 19: "agentData" Launch diagnostic report.

Function call failed. Function 'MATLAB Function' (#173.140.167), line 7, column 11: "localEvaluate(observation1)" Launch diagnostic report.

Errors occurred during parsing of MATLAB function 'rlCartPoleSimscapeModel1/MATLAB Function'

Simulink cannot determine sizes and/or types of the outputs for block 'rlCartPoleSimscapeModel1/MATLAB Function' due to errors in the block body, or limitations of the underlying analysis. The errors might be inaccurate. Fix the indicated errors, or explicitly specify sizes and/or types for all block outputs.

Error occurred in 'rlCartPoleSimscapeModel1/MATLAB Function'.

Any feedback is much appreciated.

Thank you.

댓글 수: 2
없음 표시없음 숨기기

Rajesh Siraskar 2020년 2월 3일

Hello MATLAB

This question does not seem to have been answered - will anyone please help.

Francisco Sanchez has provided a workaround - is that how it is meant to be done then?

Thank you - appreciate any help

Rene Titze 2020년 2월 14일

Hi,

in the Function Block i take this code:

function y = predictTau(u)

coder.extrinsic('evalin');

coder.extrinsic('predict');

policy = evalin('base', 'policy');

y = single(0.0);

y = predict(policy,u);

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Sandra Diaz Segura 2020년 8월 6일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/485994-reinforcement-learning-how-to-use-a-trained-policy-as-a-controller-block-in-simulink#answer_475768

Any answers?, I think this is a legit question with enough information to be solved.

And given the lack of information in the RL user guide, MATLAB should have an answer to this!

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Answer 2

Francisco Sanchez 2019년 10월 21일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/485994-reinforcement-learning-how-to-use-a-trained-policy-as-a-controller-block-in-simulink#answer_397386

rl.PNG

Hi,

I also had this error. Instead of solving it I went around it by using an "Interpreted MATLAB Fcn" block pointing to the .m generated by the "generatePolicyFunction" function of the RL toolbox. It does work although perhaps its not the fastest way.

Fran

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기

Rajesh Siraskar 2020년 2월 3일

MATLAB Online에서 열기

Hi Fran,

Any idea what I could be doing wrong here? I used the generatePolicyFunction(agent) to generate the .m and .dat files and tried the trick you suggested. So you see the training setup first, then the 'deployment' set up and then where I am getting the error.

Somehow the observation vector doesn't agree although it is used as is.

Invalid setting for output port dimensions of 'Observation vector/Mux1'. The dimensions are being set to 1. This is not valid because the total number of input and output elements are not the same
Component:Simulink | Category:Model error
Error in port widths or dimensions. Input port 1 of 'Observation vector/Observation vector' is a one dimensional vector with 1 elements.

H. M. 2022년 10월 20일

@Rajesh Siraskar

Hi Rajesh

Did the problem you faced solved. If yes, could you provide me the solution.

I faced the same problem.

Thanks.

댓글을 달려면 로그인하십시오.

Answer 3

krishna teja 2020년 2월 11일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/485994-reinforcement-learning-how-to-use-a-trained-policy-as-a-controller-block-in-simulink#answer_414984

hi

use reshape block to set observation vector to column vector before feeding to matlab interpreter block

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Rajesh Siraskar 2020년 2월 15일

Thanks Krishna - let me try this and confirm. Rajesh

댓글을 달려면 로그인하십시오.

Answer 4

SomeMatlabUser 2020년 2월 20일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/485994-reinforcement-learning-how-to-use-a-trained-policy-as-a-controller-block-in-simulink#answer_416544

Is there any Update (maybe from Mathworks itself) to actually solve this problem? I am trying to get my model working with code generation. Using the interpreted matlab function block is not suitable and the documentation by mathworks is lacking at this point.

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Sviatoslav Klos 2020년 3월 3일

I already settled this issue to generate code in MATLAB. But I did not solve it for code generation in SIMULINK.

댓글을 달려면 로그인하십시오.

Answer 5

Kishen Mahadevan 2021년 3월 15일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/485994-reinforcement-learning-how-to-use-a-trained-policy-as-a-controller-block-in-simulink#answer_648707

Hello,

Starting R2020b, the 'Predict' block and the 'MATLAB Function' block allow using pre-trained networks including Reinforcement Learning policies in Simulink to perform inference. You can use either of the blocks to replace the RL Agent block in your model to perform inference in Simulink.

Please refer to this MATLAB Answers post for more information.

Hope it help!

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Reinforcement Learning - How to use a 'trained policy' as a 'controller' block in SIMULINK

댓글 수: 2
없음 표시없음 숨기기

답변 (5개)

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

Community Treasure Hunt

Reinforcement Learning - How to use a 'trained policy' as a 'controller' block in SIMULINK

댓글 수: 2 없음 표시없음 숨기기

답변 (5개)

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 4 이전 댓글 2개 표시이전 댓글 2개 숨기기

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

Community Treasure Hunt

댓글 수: 2
없음 표시없음 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기