Create observation and action specification objects. For this example, define the observation and action spaces as continuous four- and two-dimensional spaces, respectively.
Alternatively, you can use getObservationInfo
and getActionInfo
to extract the specification objects from an environment.
Create a continuous deterministic actor. This actor must accept an observation as input and return an action as output.
To approximate the policy function within the actor, use a recurrent deep neural network model. Define the network as an array of layer objects, and get the dimension of the observation and action spaces from the environment specification objects. To create a recurrent network, use a sequenceInputLayer
as the input layer (with size equal to the number of dimensions of the observation channel) and include at least one lstmLayer
.
Convert the network to a dlnetwork
object and display the number of weights.
Initialized: true
Number of learnables: 62
Inputs:
1 'sequenceinput' Sequence input with 4 dimensions
Create the actor using model, and the observation and action specifications.
actor =
rlContinuousDeterministicActor with properties:
ObservationInfo: [1x1 rl.util.rlNumericSpec]
ActionInfo: [1x1 rl.util.rlNumericSpec]
UseDevice: "cpu"
Check the actor with a random observation input.
ans = 2x1 single column vector
0.0568
0.0691
Create an additive noise policy object from actor
.
policy =
rlAdditiveNoisePolicy with properties:
Actor: [1x1 rl.function.rlContinuousDeterministicActor]
NoiseType: "gaussian"
NoiseOptions: [1x1 rl.option.GaussianActionNoise]
EnableNoiseDecay: 1
UseNoisyAction: 1
ObservationInfo: [1x1 rl.util.rlNumericSpec]
ActionInfo: [1x1 rl.util.rlNumericSpec]
SampleTime: -1
Use dot notation to set the standard deviation decay rate.
Use getAction
to generate an action from the policy, given a random observation input.
Display the state of the recurrent neural network in the policy object.
ans = 2x1 single column vector
0
0
Use getAction
to also return the updated policy as a second argument.
Display the state of the recurrent neural network in the updated policy object.
ans = 2x1 single column vector
0.3327
-0.2479
As expected, the state is updated.