accelerate
Option to accelerate the computation of the gradient for approximator object based on neural network
Description
returns a new neural-network-based function approximator object,
newAppx
= accelerate(oldAppx
,useAcceleration
)newAppx
, with the same configuration as the original object,
oldAppx
, and the option to accelerate the gradient computation set to
the logical value useAcceleration
.
Examples
Accelerate Gradient Computation for a Q-Value Function
Create observation and action specification objects (or alternatively use getObservationInfo
and getActionInfo
to extract the specification objects from an environment). For this example, define the observation space as consisting of two channels. The first channel carries an observation from a continuous four-dimensional space. The second carries a discrete scalar observation that can be either zero or one. Finally, the action space is a three-dimensional vector in a continuous action space.
obsInfo = [rlNumericSpec([4 1]) rlFiniteSetSpec([0 1])]; actInfo = rlNumericSpec([3 1]);
Create a deep neural network to be used as approximation model within the critic. The output layer must be a scalar expressing the value of executing the action given the observation. To create a recurrent neural network, use sequenceInputLayer
as the input layer and include an lstmLayer
as one of the other network layers.
inPath1 = [ sequenceInputLayer(prod(obsInfo(1).Dimension), ... 'Normalization','none','Name','netObsIn1') fullyConnectedLayer(5, ... 'Name','infc1') ]; inPath2 = [ sequenceInputLayer(prod(obsInfo(2).Dimension), ... 'Normalization','none','Name','netObsIn2') fullyConnectedLayer(5, ... 'Name','infc2') ]; inPath3 = [ sequenceInputLayer(prod(actInfo(1).Dimension), ... 'Normalization','none','Name','netActIn') fullyConnectedLayer(5, ... 'Name','infc3') ]; % concatenate the previous layers outputs along the first dimension jointPath = [ concatenationLayer(1,3,'Name','cct') tanhLayer('Name','tanhJnt'); lstmLayer(8,'OutputMode','sequence','Name','lstm'); fullyConnectedLayer(1, ... 'Name','jntfc'); ]; % add layers to network object net = layerGraph; net = addLayers(net,inPath1); net = addLayers(net,inPath2); net = addLayers(net,inPath3); net = addLayers(net,jointPath); % connect layers net = connectLayers(net,'infc1','cct/in1'); net = connectLayers(net,'infc2','cct/in2'); net = connectLayers(net,'infc3','cct/in3'); % plot network plot(net)
Create the critic with rlQValueFunction
, using the network, the observations and action specification objects.
critic = rlQValueFunction(net,obsInfo,actInfo);
To return the value of the actions a function of the current observation, use getValue
or evaluate
.
val = evaluate(critic, ... { rand(obsInfo(1).Dimension), ... rand(obsInfo(2).Dimension), ... rand(actInfo(1).Dimension) })
val = 1x1 cell array
{[0.1360]}
When using evaluate
, the result it a single-element cell array, containing the value of the action in input, given the observation.
val{1}
ans = single
0.1360
Calculate the gradients of the sum of the three outputs with respect to the inputs, given a random observation.
gro = gradient(critic,'output-input', ... {rand(obsInfo(1).Dimension) , ... rand(obsInfo(2).Dimension) , ... rand(actInfo(1).Dimension) } )
gro=3×1 cell array
{4x1 single}
{[ 0.0243]}
{3x1 single}
The result is a cell array with as many elements as the number of input channels. Each element contains the derivatives of the sum of the outputs with respect to each component of the input channel. Display the gradient with respect to the element of the second channel.
gro{2}
ans = single
0.0243
Obtain the gradient with respect of 5 independent sequences each one consisting of 9
sequential observations.
gro_batch = gradient(critic,'output-input', ... {rand([obsInfo(1).Dimension 5 9]) , ... rand([obsInfo(2).Dimension 5 9]) , ... rand([actInfo(1).Dimension 5 9]) } )
gro_batch=3×1 cell array
{4x5x9 single}
{1x5x9 single}
{3x5x9 single}
Display the derivative of the sum of the outputs with respect to the third observation element of the first input channel, after the seventh sequential observation in the fourth independent batch.
gro_batch{1}(3,4,7)
ans = single
0.0108
Set the option to accelerate the gradient computations.
critic = accelerate(critic,true);
Calculate the gradients of the sum of the outputs with respect to the parameters, given a random observation.
grp = gradient(critic,'output-parameters', ... {rand(obsInfo(1).Dimension) , ... rand(obsInfo(2).Dimension) , ... rand(actInfo(1).Dimension) } )
grp=11×1 cell array
{ 5x4 single }
{ 5x1 single }
{ 5x1 single }
{ 5x1 single }
{ 5x3 single }
{ 5x1 single }
{32x15 single }
{32x8 single }
{32x1 single }
{[0.0444 0.1280 -0.1560 0.0193 0.0262 0.0453 -0.0186 -0.0651]}
{[ 1]}
Each array within a cell contains the gradient of the sum of the outputs with respect to a group of parameters.
grp_batch = gradient(critic,'output-parameters', ... {rand([obsInfo(1).Dimension 5 9]) , ... rand([obsInfo(2).Dimension 5 9]) , ... rand([actInfo(1).Dimension 5 9]) } )
grp_batch=11×1 cell array
{ 5x4 single }
{ 5x1 single }
{ 5x1 single }
{ 5x1 single }
{ 5x3 single }
{ 5x1 single }
{32x15 single }
{32x8 single }
{32x1 single }
{[2.6325 10.1821 -14.0886 0.4162 2.0677 5.3991 0.3904 -8.9048]}
{[ 45]}
If you use a batch of inputs, the gradient is calculated considering the whole input sequence (in this case 9 steps), and all the gradients with respect to the independent batch dimensions (in this case 5) are added together. Therefore, the returned gradient has always the same size as the output from getLearnableParameters
.
Accelerate Gradient Computation for a Discrete Categorical Actor
Create observation and action specification objects (or alternatively use getObservationInfo
and getActionInfo
to extract the specification objects from an environment). For this example, define the observation space as consisting of two channels. The first channel carries an observation from a continuous four-dimensional space. The second carries a discrete scalar observation that can be either zero or one. Finally, the action space consist of a scalar that can be -1
, 0
or 1
.
obsInfo = [rlNumericSpec([4 1]) rlFiniteSetSpec([0 1])]; actInfo = rlFiniteSetSpec([-1 0 1]);
Create a deep neural network to be used as approximation model within the actor. The output layer must have three elements, each one expressing the value of executing the corresponding action, given the observation. To create a recurrent neural network, use sequenceInputLayer
as the input layer and include an lstmLayer
as one of the other network layers.
inPath1 = [ sequenceInputLayer(prod(obsInfo(1).Dimension), ... 'Normalization','none','Name','netObsIn1') fullyConnectedLayer(prod(actInfo.Dimension), ... 'Name','infc1') ]; inPath2 = [ sequenceInputLayer(prod(obsInfo(2).Dimension), ... 'Normalization','none','Name','netObsIn2') fullyConnectedLayer(prod(actInfo.Dimension), ... 'Name','infc2') ]; % concatenate previous paths outputs along first dimension jointPath = [ concatenationLayer(1,2,'Name','cct') tanhLayer('Name','tanhJnt'); lstmLayer(8,'OutputMode','sequence','Name','lstm'); fullyConnectedLayer(prod(numel(actInfo.Elements)), ... 'Name','jntfc'); ]; % add layers to network object net = layerGraph; net = addLayers(net,inPath1); net = addLayers(net,inPath2); net = addLayers(net,jointPath); % connect layers net = connectLayers(net,'infc1','cct/in1'); net = connectLayers(net,'infc2','cct/in2'); % plot network plot(net)
Since each element of the output layer must represents the probability of executing one of the possible actions the software automatically adds a softmaxLayer
as a final output layer if you do not specify it explicitly.
Create the actor with rlDiscreteCategoricalActor
, using the network, the observations and action specification objects. When the network has multiple input layers, they are automatically associated with the environment observation channels according to the dimension specifications in obsInfo
.
actor = rlDiscreteCategoricalActor(net, obsInfo, actInfo);
To return mean value and standard deviations of the Gaussian distribution as a function of the current observation, use evaluate
.
[prob,state] = evaluate(actor, ... { rand(obsInfo(1).Dimension) , ... rand(obsInfo(2).Dimension) });
The result is a single element cell array containing a vector of probabilities for each possible action.
prob{1}
ans = 3x1 single column vector
0.3403
0.3114
0.3483
To return an action sampled from the distribution, use getAction
.
act = getAction(actor, ... {rand(obsInfo(1).Dimension) , ... rand(obsInfo(2).Dimension) }); act{1}
ans = 1
Set the option to accelerate the gradient computations.
actor = accelerate(actor,true);
Each array within a cell contains the gradient of the sum of the outputs with respect to a group of parameters.
grp_batch = gradient(actor,'output-parameters', ... {rand([obsInfo(1).Dimension 5 9]) , ... rand([obsInfo(2).Dimension 5 9])} )
grp_batch=9×1 cell array
{[-3.6041e-09 -3.5829e-09 -2.8805e-09 -3.2158e-09]}
{[ -9.0017e-09]}
{[ -1.5321e-08]}
{[ -3.0182e-08]}
{32x2 single }
{32x8 single }
{32x1 single }
{ 3x8 single }
{ 3x1 single }
If you use a batch of inputs, the gradient is calculated considering the whole input sequence (in this case 9 steps), and all the gradients with respect to the independent batch dimensions (in this case 5) are added together. Therefore, the returned gradient has always the same size as the output from getLearnableParameters
.
Input Arguments
oldAppx
— Function approximator object
rlValueFunction
object | rlQValueFunction
object | rlVectorQValueFunction
object | rlDiscreteCategoricalActor
object | rlContinuousDeterministicActor
object | rlContinuousGaussianActor
object | rlContinuousDeterministicTransitionFunction
object | rlContinuousGaussianTransitionFunction
object | rlContinuousDeterministicRewardFunction
object | rlContinuousGaussianRewardFunction
object | rlIsDoneFunction
object
Function approximator object, specified as an:
rlValueFunction
object,rlQValueFunction
object,rlVectorQValueFunction
object,rlDiscreteCategoricalActor
object,rlContinuousDeterministicActor
object,rlContinuousGaussianActor
object,rlIsDoneFunction
object.
useAcceleration
— Option to use acceleration for gradient computations
false
(default) | true
Option to use acceleration for gradient computations, specified as a logical value.
When useAcceleration
is true
, the gradient
computations are accelerated by optimizing and caching some inputs needed by the
automatic-differentiation computation graph. For more information, see Deep Learning Function Acceleration for Custom Training Loops.
Output Arguments
newAppx
— Actor or critic
rlValueFunction
object | rlQValueFunction
object | rlVectorQValueFunction
object | rlDiscreteCategoricalActor
object | rlContinuousDeterministicActor
object | rlContinuousGaussianActor
object | rlContinuousDeterministicTransitionFunction
object | rlContinuousGaussianTransitionFunction
object | rlContinuousDeterministicRewardFunction
object | rlContinuousGaussianRewardFunction
object | rlIsDoneFunction
object
New actor or critic, returned as the same type as oldAppx
but
with the gradient acceleration option set to
useAcceleration
.
Version History
See Also
evaluate
| gradient
| getLearnableParameters
| rlValueFunction
| rlQValueFunction
| rlVectorQValueFunction
| rlContinuousDeterministicActor
| rlDiscreteCategoricalActor
| rlContinuousGaussianActor
| rlContinuousDeterministicTransitionFunction
| rlContinuousGaussianTransitionFunction
| rlContinuousDeterministicRewardFunction
| rlContinuousGaussianRewardFunction
| rlIsDoneFunction
MATLAB 명령
다음 MATLAB 명령에 해당하는 링크를 클릭했습니다.
명령을 실행하려면 MATLAB 명령 창에 입력하십시오. 웹 브라우저는 MATLAB 명령을 지원하지 않습니다.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)