How to pretrain a stochastic actor network for PPO training?

조회 수: 5 (최근 30일)
Jan Dewez
Jan Dewez 2021년 5월 6일
댓글: Anh Tran 2021년 5월 17일
I want to create a stochastic actor network that outputs an action array of 10 values between 0 and 1 given an observation array of 28 normalized values. I specified upper and lower limits as follows to ensure the actor's output to be between 0 and 1:
ActionInfo = rlNumericSpec([numActions 1],'LowerLimit',[0;0;0;0;0;0;0;0;0;0],'UpperLimit',[1;1;1;1;1;1;1;1;1;1]);
My stochastic network looks as follows:
I have created a normalized training data set (input dimension 28, target dimension 10). How do I use this data set to pretrain above network?
Clarification: I want to train the network before starting the PPO agent training.

채택된 답변

Anh Tran
Anh Tran 2021년 5월 13일
Hi Jan,
You can pretrain a stochastic actor with Deep Learning Toolbox's trainNetwork with some additional work. Emmanouil gave some good pointers initially but I want to add those steps:
You need a custom loss layer since the stochastic actor network outputs mean and standard deviations, while your target is action. You can try maximum log likelihood loss. You can follow the instruction here to create a custom loss layer (you don't have to implement backward pass as autodifferentiation will take care of it)
% We want to maximize objective of log f(x) where f(x) is the probability density function follows Normal(mean, sigma)
% Loss = -Objective = - log(f(x)) = 1/2*log(2*pi) + log(sigma) + 1/2*((x-mu)/sigma)^2;
Keep in mind that you must protect against log(0), adding eps is sufficient. x is your action target.
  댓글 수: 4
Jan Dewez
Jan Dewez 2021년 5월 15일
I rewrote my custom regression class like this:
classdef myRegressionLayer < nnet.layer.RegressionLayer
methods
function layer = myRegressionLayer()
% (Optional) Create a myRegressionLayer.
layer.Name = name;
layer.Description = 'maximum log likelihood loss';
% Layer constructor function goes here.
end
function loss = forwardLoss(layer, Y, T)
% Return the loss between the predictions Y and the training
% targets T.
%
% Inputs:
% layer - Output layer
% Y – Predictions made by network (20 x minibatchsize)
% T – Training targets (20 x mminibatchsize)
%
% Output:
% loss - Loss between Y and T
numActions = height(Y)/2;
mu = Y(1:numActions,:); %(10 x minibatchsize)
sigma = Y(numActions+1:end,:); %(10 x minibatchsize)
for i = 1:numActions
loss(i,:) = 0.5*log(2*pi) + log(sigma(i,:)+eps) + 0.5*((T(i,:)-mu(i,:))./(sigma(i,:)+eps)).^2;
end
disp('loss: ');
disp(loss);
end
end
end
When I for example set MiniBatchSize to 5, loss looks like this:
loss:
10×5 single dlarray
0.7065 0.7062 0.7346 0.7249 0.6832
1.0642 1.0203 1.0669 1.0500 1.0539
0.7998 1.0349 1.3149 1.2599 0.8729
1.5574 1.5650 1.5613 1.6017 1.5787
1.2772 1.1369 1.5798 1.4769 1.2660
0.7744 0.7541 0.7840 0.7776 0.7427
0.8501 0.8206 0.8311 0.8372 0.8288
0.7570 0.7704 0.7467 0.8035 0.7890
0.7789 0.7916 0.7898 0.7881 0.8122
0.7692 0.7411 0.7553 0.7528 0.7689
Followed by this error:
Error using trainNetwork (line 183)
Error using 'backwardLoss' in Layer myRegressionLayer. The function threw an error and
could not be executed.
Error in pretraining (line 42)
net = trainNetwork(PtrainArray2,TtrainArray2_ext,net_actor,options);
Caused by:
Error using dlarray/dlgradient (line 51)
Value to differentiate must be a traced dlarray scalar.
I am not sure how to fix this. What should 'loss' look like?
Anh Tran
Anh Tran 2021년 5월 17일
As mentioned from the error message, value to differentiate must be a scalar. Thus, you need to compute mean of the loss over each batch. Also, I am not sure why you need a for-loop to compute loss. We can vectorize the computation as followed (since sigma, T, mu have same size)
% vectorize loss computation
loss = 0.5*log(2*pi) + log(sigma + eps) + 0.5*((T-mu)./(sigma+eps)).^2;
% mean of the loss over each batch
loss = sum(loss,'all');
loss = loss/batchSize;

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

Emmanouil Tzorakoleftherakis
Emmanouil Tzorakoleftherakis 2021년 5월 13일
Hello,
Since you already have a dataset, you will have to use Deep Learning Toolbox to get your initial policy. Take a look at the examples below to get an idea:
  댓글 수: 1
Jan Dewez
Jan Dewez 2021년 5월 13일
Hello Emmanouil,
Thanks for the response, but how do I train a stochastic actor with output dimension 20 when my train data has dimension 10? Do I need to convert my train set in such a way that I obtain means & st. devs?

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Training and Simulation에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by