getAction

Obtain action from agent, actor, or policy object given environment observations

Since R2020a

collapse all in page

Syntax

action = getAction(agent,obs)

[action,agent] = getAction(agent,obs)

action = getAction(actor,obs)

[action,nextState] = getAction(actor,obs)

action = getAction(policy,obs)

[action,updatedPolicy] = getAction(policy,obs)

___ = getAction(___,UseForward=useForward)

Description

Agent

action = getAction(agent,obs) returns the action generated from the policy of a reinforcement learning agent, given environment observations. If agent contains internal states, they are updated.

example

[action,agent] = getAction(agent,obs) also returns the updated agent as an output argument.

Actor

action = getAction(actor,obs) returns the action generated from the policy represented by the actor actor, given environment observations obs.

example

[action,nextState] = getAction(actor,obs) also returns the updated state of the actor when it uses a recurrent neural network.

Policy

action = getAction(policy,obs) returns the action generated from the policy object policy, given environment observations.

example

[action,updatedPolicy] = getAction(policy,obs) also returns the updated policy as an output argument (any internal state of the policy, if used, is updated).

Use Forward

___ = getAction(___,UseForward=useForward) allows you to explicitly call a forward pass when computing gradients.

Examples

collapse all

Get Actions from Agent

Open Live Script

Create an environment with a discrete action space, and obtain its observation and action specifications. For this example, load the environment used in the example Create DQN Agent Using Deep Network Designer and Train Using Image Observations.

% load predefined environment
env = rlPredefinedEnv("SimplePendulumWithImage-Discrete");

Obtain the observation and action specifications for this environment.

obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);

Create a TRPO agent from the environment observation and action specifications.

agent = rlTRPOAgent(obsInfo,actInfo);

Use getAction to return the action from a random observation.

getAction(agent, ...
    {rand(obsInfo(1).Dimension), ...
     rand(obsInfo(2).Dimension)})

ans = 1x1 cell array
    {[-2]}

You can also obtain actions for a batch of observations. For example, obtain actions for a batch of 10 observations.

actBatch = getAction(agent, ...
    {rand([obsInfo(1).Dimension 10]), ...
     rand([obsInfo(2).Dimension 10])});
size(actBatch{1})

ans = 1×3

     1     1    10

actBatch{1}(1,1,7)

ans = 
-2

actBatch contains one action for each observation in the batch.

Get Action from Deterministic Actor

Open Live Script

Create observation and action information. You can also obtain these specifications from an environment.

obsinfo = rlNumericSpec([4 1]);
actinfo = rlNumericSpec([2 1]);

Create a deep neural network for the actor.

net = [featureInputLayer(obsinfo.Dimension(1), ...
           "Normalization","none","Name","state")
       fullyConnectedLayer(10,"Name","fc1")
       reluLayer("Name","relu1")
       fullyConnectedLayer(20,"Name","CriticStateFC2")
       fullyConnectedLayer(actinfo.Dimension(1),"Name","fc2")
       tanhLayer("Name","tanh1")];
net = dlnetwork(net);

Create a deterministic actor representation for the network.

actor = rlContinuousDeterministicActor(net, ...
    obsinfo,actinfo,...
    ObservationInputNames={"state"});

Obtain an action from this actor for a random batch of 20 observations.

act = getAction(actor,{rand(4,1,10)})

act = 1x1 cell array
    {2x1x10 single}

act is a single cell array that contains the two computed actions for all 10 observations in the batch.

act{1}(:,1,7)

ans = 2x1 single column vector

    0.2643
   -0.2934

Get Actions from Policy Object

Open Live Script

Create observation and action specification objects. For this example, define the observation and action spaces as continuous four- and two-dimensional spaces, respectively.

obsInfo = rlNumericSpec([4 1]);
actInfo = rlNumericSpec([2 1]);

Alternatively, you can use getObservationInfo and getActionInfo to extract the specification objects from an environment.

Create a continuous deterministic actor. This actor must accept an observation as input and return an action as output.

To approximate the policy function within the actor, use a recurrent deep neural network model. Define the network as an array of layer objects, and get the dimension of the observation and action spaces from the environment specification objects. To create a recurrent network, use a sequenceInputLayer as the input layer (with size equal to the number of dimensions of the observation channel) and include at least one lstmLayer.

layers = [ 
    sequenceInputLayer(obsInfo.Dimension(1))
    lstmLayer(2)
    reluLayer
    fullyConnectedLayer(actInfo.Dimension(1)) 
    ];

Convert the network to a dlnetwork object and display the number of weights.

model = dlnetwork(layers);
summary(model)

   Initialized: true

   Number of learnables: 62

   Inputs:
      1   'sequenceinput'   Sequence input with 4 dimensions

Create the actor using model, and the observation and action specifications.

actor = rlContinuousDeterministicActor(model,obsInfo,actInfo)

actor = 
  rlContinuousDeterministicActor with properties:

    ObservationInfo: [1x1 rl.util.rlNumericSpec]
         ActionInfo: [1x1 rl.util.rlNumericSpec]
      Normalization: "none"
          UseDevice: "cpu"
         Learnables: {5x1 cell}
              State: {2x1 cell}

Check the actor with a random observation input.

act = getAction(actor,{rand(obsInfo.Dimension)});
act{1}

ans = 2x1 single column vector

    0.0568
    0.0691

Create an additive noise policy object from actor.

policy = rlAdditiveNoisePolicy(actor)

policy = 
  rlAdditiveNoisePolicy with properties:

               Actor: [1x1 rl.function.rlContinuousDeterministicActor]
           NoiseType: "gaussian"
        NoiseOptions: [1x1 rl.option.GaussianActionNoise]
    EnableNoiseDecay: 1
       Normalization: "none"
      UseNoisyAction: 1
     ObservationInfo: [1x1 rl.util.rlNumericSpec]
          ActionInfo: [1x1 rl.util.rlNumericSpec]
          SampleTime: -1

Use dot notation to set the standard deviation decay rate.

policy.NoiseOptions.StandardDeviationDecayRate = 0.9;

Use getAction to generate an action from the policy, given a random observation input.

act = getAction(policy,{rand(obsInfo.Dimension)});
act{1}

ans = 2×1

    0.5922
   -0.3745

Display the state of the recurrent neural network in the policy object.

xNN = getRNNState(policy);
xNN{1}

ans = 2x1 single column vector

     0
     0

Use getAction to also return the updated policy as a second argument.

[act, updatedPolicy] = getAction(policy,{rand(obsInfo.Dimension)});

Display the state of the recurrent neural network in the updated policy object.

xpNN = getRNNState(updatedPolicy);
xpNN{1}

ans = 2x1 single column vector

    0.3327
   -0.2479

As expected, the state is updated.

Input Arguments

collapse all

`agent` — Reinforcement learning agent
reinforcement learning agent object

Reinforcement learning agent, specified as one of the following objects:

rlQAgent
rlSARSAAgent
rlDQNAgent
rlPGAgent
rlDDPGAgent
rlTD3Agent
rlACAgent
rlSACAgent
rlPPOAgent
rlTRPOAgent
Custom agent — For more information, see Create Custom Reinforcement Learning Agents.

Note

agent is an handle object, so it is updated whether it is returned as an output argument or not. For more information about handle objects, see Handle Object Behavior.

`actor` — Actor
`rlContinuousDeterministicActor` object | `rlContinuousGaussianActor` object | `rlDiscreteCategoricalActor` object

Actor, specified as an rlContinuousDeterministicActor, rlDiscreteCategoricalActor or rlContinuousGaussianActor object.

`policy` — Reinforcement learning policy
`rlMaxQPolicy` | `rlEpsilonGreedyPolicy` | `rlDeterministicActorPolicy` | `rlAdditiveNoisePolicy` | `rlStochasticActorPolicy`

Reinforcement learning policy, specified as one of the following objects:

`obs` — Environment observations
cell array

Environment observations, specified as a cell array with as many elements as there are observation input channels. Each element of obs contains an array of observations for a single observation input channel.

The dimensions of each element in obs are M_O-by-L_B-by-L_S, where:

M_O corresponds to the dimensions of the associated observation input channel.
L_B is the batch size. To specify a single observation, set L_B = 1. To specify a batch of observations, specify L_B > 1. If valueRep or qValueRep has multiple observation input channels, then L_B must be the same for all elements of obs.
L_S specifies the sequence length for a recurrent neural network. If valueRep or qValueRep does not use a recurrent neural network, then L_S = 1. If valueRep or qValueRep has multiple observation input channels, then L_S must be the same for all elements of obs.

L_B and L_S must be the same for both act and obs.

For more information on input and output formats for recurrent neural networks, see the Algorithms section of lstmLayer.

`useForward` — Option to use parallel training
`false` (default) | `true`

Option to use forward pass, specified as a logical value. When you specify UseForward=true the function calculates its outputs using forward instead of predict. This allows layers such as batch normalization and dropout to appropriately change their behavior for training.

Example: true

Output Arguments

collapse all

`action` — Action
cell array

Action, returned as cell array containing either one (for discrete or continuous action spaces) or two (for hybrid action spaces) elements. Each element of the array in turn contains the action corresponding to obs, which is an array with dimensions M_A-by-L_B-by-L_S, where:

M_A corresponds to the dimensions of the associated action specification.
L_B is the batch size.
L_S is the sequence length for recurrent neural networks. If the agent, actor, or policy calculating action do not use recurrent neural networks, then L_S = 1.

For hybrid action spaces, the first element of the cell array contains the discrete part of the action, while the second element contains the continuous part of the action.

Note

The following continuous action-space actor, policy and agent objects do not enforce the constraints set by the action specification:

In these cases, you must enforce action space constraints within the environment.

`nextState` — Next state of the actor
cell array

Next state of the actor, returned as a cell array. If actor does not use a recurrent neural network, then state is an empty cell array.

You can set the state of the actor to state using dot notation. For example:

actor.State=state;

`agent` — Updated agent
reinforcement learning agent object

Updated agent, returned as the same agent object as the agent in the input argument. Note that agent is an handle object. Therefore, its internal states (if any) are updated whether agent is returned as an output argument or not. For more information about handle objects, see Handle Object Behavior.

`updatedPolicy` — Updated policy object
reinforcement learning policy object

Updated policy object. It is identical to the policy object supplied as first input argument, except that its internal states (if any) are updated.

Tips

The function evaluate behaves, for actor objects, similarly to getAction except for the following differences.
- For an rlDiscreteCategoricalActor actor object, evaluate returns the probability of each possible actions, (instead of a sample action as getAction).
- For an rlContinuousGaussianActor actor object, evaluate returns the mean and standard deviation of the Gaussian distribution, (instead of a sample action as getAction).
When the elements of the cell array in inData are dlarray objects, the elements of the cell array returned in outData are also dlarray objects. This allows getAction to be used with automatic differentiation.
Specifically, you can write a custom loss function that directly uses getAction and dlgradient within it, and then use dlfeval and dlaccelerate with your custom loss function. For an example, see Train Reinforcement Learning Policy Using Custom Training Loop and Custom Training Loop with Simulink Action Noise.

Version History

Introduced in R2020a

getAction

Syntax

Description

Agent

Actor

Policy

Use Forward

Examples

Get Actions from Agent

Get Action from Deterministic Actor

Get Actions from Policy Object

Input Arguments

`agent` — Reinforcement learning agent
reinforcement learning agent object

`actor` — Actor
`rlContinuousDeterministicActor` object | `rlContinuousGaussianActor` object | `rlDiscreteCategoricalActor` object

`policy` — Reinforcement learning policy
`rlMaxQPolicy` | `rlEpsilonGreedyPolicy` | `rlDeterministicActorPolicy` | `rlAdditiveNoisePolicy` | `rlStochasticActorPolicy`

`obs` — Environment observations
cell array

`useForward` — Option to use parallel training
`false` (default) | `true`

Output Arguments

`action` — Action
cell array

`nextState` — Next state of the actor
cell array

`agent` — Updated agent
reinforcement learning agent object

`updatedPolicy` — Updated policy object
reinforcement learning policy object

Tips

Version History

See Also

Functions

Topics

getAction

Syntax

Description

Agent

Actor

Policy

Use Forward

Examples

Get Actions from Agent

Get Action from Deterministic Actor

Get Actions from Policy Object

Input Arguments

agent — Reinforcement learning agent reinforcement learning agent object

actor — Actor rlContinuousDeterministicActor object | rlContinuousGaussianActor object | rlDiscreteCategoricalActor object

policy — Reinforcement learning policy rlMaxQPolicy | rlEpsilonGreedyPolicy | rlDeterministicActorPolicy | rlAdditiveNoisePolicy | rlStochasticActorPolicy

obs — Environment observations cell array

useForward — Option to use parallel training false (default) | true

Output Arguments

action — Action cell array

nextState — Next state of the actor cell array

agent — Updated agent reinforcement learning agent object

updatedPolicy — Updated policy object reinforcement learning policy object

Tips

Version History

See Also

Functions

Topics

`agent` — Reinforcement learning agent
reinforcement learning agent object

`actor` — Actor
`rlContinuousDeterministicActor` object | `rlContinuousGaussianActor` object | `rlDiscreteCategoricalActor` object

`policy` — Reinforcement learning policy
`rlMaxQPolicy` | `rlEpsilonGreedyPolicy` | `rlDeterministicActorPolicy` | `rlAdditiveNoisePolicy` | `rlStochasticActorPolicy`

`obs` — Environment observations
cell array

`useForward` — Option to use parallel training
`false` (default) | `true`

`action` — Action
cell array

`nextState` — Next state of the actor
cell array

`agent` — Updated agent
reinforcement learning agent object

`updatedPolicy` — Updated policy object
reinforcement learning policy object