Main Content

rlContinuousGaussianActor

Stochastic Gaussian actor with a continuous action space for reinforcement learning agents

Since R2022a

Description

This object implements a function approximator to be used as a stochastic actor within a reinforcement learning agent with a continuous action space. A continuous Gaussian actor takes an environment observation as input and returns as output a random action sampled from a parametrized Gaussian probability distribution, thereby implementing a parametrized stochastic policy. After you create an rlContinuousGaussianActor object, use it to create a suitable agent, such as an rlACAgent or rlPGAgent agent. For more information on creating actors and critics, see Create Policies and Value Functions.

Creation

Description

actor = rlContinuousGaussianActor(net,observationInfo,actionInfo,ActionMeanOutputNames=meanOutLyrName,ActionStandardDeviationOutputNames=stdOutLyrName) creates a Gaussian stochastic actor with a continuous action space using the deep neural network net as approximation model. Here, net must have two differently named output layers, each with as many elements as the number of dimensions of the action space, as specified in actionInfo. The two output layers must return the mean and standard deviation of each component of the action, respectively. The actor uses the output from these layers, according to the names specified in the strings meanOutLyrName and netStdActName, to represent the Gaussian probability distribution from which the action is sampled. This syntax sets the ObservationInfo and ActionInfo properties of actor to the input arguments observationInfo and actionInfo, respectively.

Note

actor does not enforce constraints set by the action specification. When using this actor anywhere else than in a SAC agent, you must enforce action space constraints within the environment.

example

actor = rlContinuousGaussianActor(___,Name=Value) specifies names of the observation input layers (for network-based approximators) or sets the UseDevice property using one or more name-value arguments. Specifying the input layer names allows you explicitly associate the layers of your network approximator with specific environment channels. For all types of approximators, you can specify the device where computations for actor are executed, for example UseDevice="gpu".

Input Arguments

expand all

Deep neural network used as the underlying approximation model within the actor. It must have as many input layers as the number of environment observation channels (with each input layer receiving input from an observation channel). The network must have two differently named output layers each with as many elements as the number of dimensions of the action space, as specified in actionInfo. The two output layers return the mean and standard deviation of each component of the action. The actor uses these layers, according to the names specified in the strings meanOutLyrName and netStdActName, to represent the Gaussian probability distribution from which the action is sampled.

Note

Since standard deviations must be nonnegative, the output layer that returns the standard deviations must be a softplus or ReLU layer, to enforce nonnegativity. Also, unless the actor is used in a SAC agent, the mean values must fall within the range of the action. In this case, to scale the mean values to the output range, use a scaling layer as the output layer for the mean values, preceded by an hyperbolic tangent layer. SAC agents automatically read the action range from the UpperLimit and LowerLimit properties of the action specification and then internally scale the distribution and bounds the action. Therefore, if the actor must be used in a SAC agent, do not add any layer that scales or bounds the mean values output.

You can specify the network as one of the following:

Note

Among the different network representation options, dlnetwork is preferred, since it has built-in validation checks and supports automatic differentiation. If you pass another network object as an input argument, it is internally converted to a dlnetwork object. However, best practice is to convert other representations to dlnetwork explicitly before using it to create a critic or an actor for a reinforcement learning agent. You can do so using dlnet=dlnetwork(net), where net is any neural network object from the Deep Learning Toolbox™. The resulting dlnet is the dlnetwork object that you use for your critic or actor. This practice allows a greater level of insight and control for cases in which the conversion is not straightforward and might require additional specifications.

rlContinuousGaussianActor objects support recurrent deep neural networks.

The learnable parameters of the actor are the weights of the deep neural network. For a list of deep neural network layers, see List of Deep Learning Layers. For more information on creating deep neural networks for reinforcement learning, see Create Policies and Value Functions.

Names of the network output layers corresponding to the mean values of the action channel, specified as a string or character vector. The actor uses this name to select the network output layer that returns the mean values of each elements of the action channel. Therefore, this network output layer must be named as indicated in meanOutLyrName. Furthermore, it must be a scaling layer that scales the returned mean values to the desired action range.

Example: "myNetOutLyr_Mean_Values"

Names of the network output layers corresponding to the standard deviations of the action channel, specified as a string or character vector. The actor uses this name to select the network output layer that returns the standard deviations of each elements of the action channel. Therefore, this network output layer must be named as indicated in stdOutLyrName. Furthermore, it must be a softplus or ReLU layer, to enforce nonnegativity of the returned standard deviations.

Example: "myNetOutLyr_Std_Values"

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: UseDevice="gpu"

Network input layers names corresponding to the environment observation channels, specified as a string array or a cell array of strings or character vectors. The function assigns, in sequential order, each environment observation channel specified in observationInfo to each layer whose name is specified in the array assigned to this argument. Therefore, the specified network input layers, ordered as indicated in this argument, must have the same data type and dimensions as the observation channels, as ordered in observationInfo.

Example: ObservationInputNames={"obsInLyr1_airspeed","obsInLyr2_altitude"}

Properties

expand all

Observation specifications, specified as an rlFiniteSetSpec or rlNumericSpec object or an array containing a mix of such objects. Each element in the array defines the properties of an environment observation channel, such as its dimensions, data type, and name.

When you create the approximator object, the constructor function sets the ObservationInfo property to the input argument observationInfo.

You can extract observationInfo from an existing environment, function approximator, or agent using getObservationInfo. You can also construct the specifications manually using rlFiniteSetSpec or rlNumericSpec.

Example: [rlNumericSpec([2 1]) rlFiniteSetSpec([3,5,7])]

Action specifications, specified as an rlNumericSpec object. This object defines the properties of the environment action channel, such as its dimensions, data type, and name.

Note

Only one action channel is allowed.

When you create the approximator object, the constructor function sets the ActionInfo property to the input argument actionInfo.

You can extract ActionInfo from an existing environment, approximator object, or agent using getActionInfo. You can also construct the specification manually using rlNumericSpec.

Example: rlNumericSpec([2 1])

Normalization method, returned as an array in which each element (one for each input channel defined in the observationInfo and actionInfo properties, in that order) is one of the following values:

  • "none" — Do not normalize the input of the function approximator object.

  • "rescale-zero-one" — Normalize the input by rescaling it to the interval between 0 and 1. The normalized input Y is (UMin)./(UpperLimitLowerLimit), where U is the nonnormalized input. Note that nonnormalized input values lower than LowerLimit result in normalized values lower than 0. Similarly, nonnormalized input values higher than UpperLimit result in normalized values higher than 1. Here, UpperLimit and LowerLimit are the corresponding properties defined in the specification object of the input channel.

  • "rescale-symmetric" — Normalize the input by rescaling it to the interval between –1 and 1. The normalized input Y is 2(ULowerLimit)./(UpperLimitLowerLimit) – 1, where U is the nonnormalized input. Note that nonnormalized input values lower than LowerLimit result in normalized values lower than –1. Similarly, nonnormalized input values higher than UpperLimit result in normalized values higher than 1. Here, UpperLimit and LowerLimit are the corresponding properties defined in the specification object of the input channel.

Note

When you specify the Normalization property of rlAgentInitializationOptions, normalization is applied only to the approximator input channels corresponding to rlNumericSpec specification objects in which both the UpperLimit and LowerLimit properties are defined. After you create the agent, you can use setNormalizer to assign normalizers that use any normalization method. For more information on normalizer objects, see rlNormalizer.

Example: "rescale-symmetric"

Computation device used to perform operations such as gradient computation, parameter update and prediction during training and simulation, specified as either "cpu" or "gpu".

The "gpu" option requires both Parallel Computing Toolbox™ software and a CUDA® enabled NVIDIA® GPU. For more information on supported GPUs see GPU Computing Requirements (Parallel Computing Toolbox).

You can use gpuDevice (Parallel Computing Toolbox) to query or select a local GPU device to be used with MATLAB®.

Note

Training or simulating an agent on a GPU involves device-specific numerical round-off errors. These errors can produce different results compared to performing the same operations using a CPU.

To speed up training by using parallel processing over multiple cores, you do not need to use this argument. Instead, when training your agent, use an rlTrainingOptions object in which the UseParallel option is set to true. For more information about training using multicore processors and GPUs for training, see Train Agents Using Parallel Computing and GPUs.

Example: "gpu"

Learnable parameters of the approximation object, specified as a cell array of dlarray objects. This property contains the learnable parameters of the approximation model used by the approximator object.

Example: {dlarray(rand(256,4)),dlarray(rand(256,1))}

State of the approximation object, specified as a cell array of dlarray objects. For dlnetwork-based models, this property contains the Value column of the State property table of the dlnetwork model. The elements of the cell array are the state of the recurrent neural network used in the approximator (if any), as well as the state for the batch normalization layer (if used).

For model types that are not based on a dlnetwork object, this property is an empty cell array, since these model types do not support states.

Example: {dlarray(rand(256,1)),dlarray(rand(256,1))}

Object Functions

rlACAgentActor-critic (AC) reinforcement learning agent
rlPGAgentPolicy gradient (PG) reinforcement learning agent
rlPPOAgentProximal policy optimization (PPO) reinforcement learning agent
rlSACAgentSoft actor-critic (SAC) reinforcement learning agent
getActionObtain action from agent, actor, or policy object given environment observations
evaluateEvaluate function approximator object given observation (or observation-action) input data
gradient (Not recommended) Evaluate gradient of function approximator object given observation and action input data
accelerate (Not recommended) Option to accelerate computation of gradient for approximator object based on neural network
getLearnableParametersObtain learnable parameter values from agent, function approximator, or policy object
setLearnableParametersSet learnable parameter values of agent, function approximator, or policy object
setModelSet approximation model in function approximator object
getModelGet approximation model from function approximator object

Examples

collapse all

Create an observation specification object (or alternatively use getObservationInfo to extract the specification object from an environment). For this example, define the observation space as a continuous six-dimensional space, so that there is a single observation channel that carries a column vector containing five doubles.

obsInfo = rlNumericSpec([5 1]);

Create an action specification object (or alternatively use getActionInfo to extract the specification object from an environment). For this example, define the action space as a continuous three-dimensional space, so that the action channel carries a column vector containing three doubles, each between -10 and 10.

actInfo = rlNumericSpec([3 1], ...
    LowerLimit=-10, ...
    UpperLimit=10);

A continuous Gaussian actor implements a parametrized stochastic policy for a continuous action space. This actor takes an observation as input and returns as output a random action sampled from a Gaussian probability distribution.

To approximate the mean values and standard deviations of the Gaussian distribution, you must use a neural network with two output layers, each having as many elements as the dimension of the action space. One output layer must return a vector containing the mean values for each action dimension. The other must return a vector containing the standard deviation for each action dimension.

Note that standard deviations must be nonnegative and mean values must fall within the range of the action. Therefore the output layer that returns the standard deviations must be a softplus or ReLU layer, to enforce nonnegativity, while the output layer that returns the mean values must be a scaling layer, to scale the mean values to the output range. However, do not add a tanhLayer as the last nonlinear layer in the mean output path if you are going to use the actor within a SAC agent. For more information see Soft Actor-Critic (SAC) Agents.

For this example the environment has only one observation channel and therefore the network has only one input layer. Note that prod(obsInfo.Dimension) and prod(actInfo.Dimension) return the number of dimensions of the observation and action spaces, respectively, regardless of whether they are arranged as row vectors, column vectors, or matrices.

Define each network path as an array of layer objects, and assign names to the input and output layers of each path. These names allow you to connect the paths and then later explicitly associate the network input and output layers with the appropriate environment channel.

% Input path layers
inPath = [ 
    featureInputLayer( ...
        prod(obsInfo.Dimension), ...
        Name="netOin")
    fullyConnectedLayer( ...
        prod(actInfo.Dimension), ...
        Name="infc") 
    ];

% Path layers for mean value 
% Using scalingLayer to scale range from (-1,1) to (-10,10)
meanPath = [ 
    tanhLayer(Name="tanhMean");
    fullyConnectedLayer(prod(actInfo.Dimension));
    scalingLayer(Name="scale", ...
    Scale=actInfo.UpperLimit) 
    ];

% Path layers for standard deviations
% Using softplus layer to make them non negative
sdevPath = [ 
    tanhLayer(Name="tanhStdv");
    fullyConnectedLayer(prod(actInfo.Dimension));
    softplusLayer(Name="splus") 
    ];

Assemble dlnetwork object.

net = dlnetwork();
net = addLayers(net,inPath);
net = addLayers(net,meanPath);
net = addLayers(net,sdevPath);

Connect layers.

net = connectLayers(net,"infc","tanhMean/in");
net = connectLayers(net,"infc","tanhStdv/in");

Plot the network.

plot(net)

Initialize network and display the number of learnable parameters (weights).

net = initialize(net);
summary(net)
   Initialized: true

   Number of learnables: 42

   Inputs:
      1   'netOin'   5 features

Create the actor with rlContinuousGaussianActor, using the network, the observation and action specification objects, and the names of the network input and output layers.

actor = rlContinuousGaussianActor(net, obsInfo, actInfo, ...
    ActionMeanOutputNames="scale",...
    ActionStandardDeviationOutputNames="splus",...
    ObservationInputNames="netOin");

To check your actor, use getAction to return an action from a random observation vector, using the current network weights. Each of the three elements of the action vector is a random sample from the Gaussian distribution with mean and standard deviation calculated, as a function of the current observation, by the neural network.

act = getAction(actor,{rand(obsInfo.Dimension)}); 
act{1}
ans = 3x1 single column vector

  -12.0285
    1.7628
   10.8733

To return the Gaussian distribution of the action, given an observation, use evaluate.

dist = evaluate(actor,{rand(obsInfo.Dimension)});

Display the vector of mean values.

dist{1}
ans = 3x1 single column vector

   -5.6127
    3.9449
    9.6213

Display the vector of standard deviations.

dist{2}
ans = 3x1 single column vector

    0.8516
    0.8366
    0.7004

You can now use the actor (along with a critic) to create an agent for the environment described by the given specification objects. Examples of agents that can work with continuous action and observation spaces, and use a continuous Gaussian actor, are rlACAgent, rlPGAgent, rlSACAgent, rlPPOAgent, and rlTRPOAgent.

For more information on creating approximator objects such as actors and critics, see Create Policies and Value Functions.

Version History

Introduced in R2022a