rlQValueFunction
Q-Value function approximator object for reinforcement learning agents
Description
This object implements a Q-value function approximator that you can use as a
critic for a reinforcement learning agent. A Q-value function maps an environment state-action
pair to a scalar value representing the predicted discounted cumulative long-term reward when
the agent starts from the given state and executes the given action. A Q-value function critic
therefore needs both the environment state and an action as inputs. After you create an
rlQValueFunction
critic, use it to create an agent such as an rlQAgent
, rlDQNAgent
, rlSARSAAgent
, rlDDPGAgent
, or rlTD3Agent
agent. For
more information on creating representations, see Create Policies and Value Functions.
Creation
Syntax
Description
creates the Q-value function object critic
= rlQValueFunction(net
,observationInfo
,actionInfo
)critic
and sets the
ObservationInfo
Here, net
is the deep neural
network used as an approximator, and must have both observation and action as inputs, and
a single scalar output. The network input layers are automatically associated with the
environment observation and action channels according to the dimension specifications in
observationInfo
and actionInfo
. This function
sets the ActionInfo
properties of critic
to the
observationInfo
and actionInfo
input
arguments, respectively.
specifies the names of the network input layers to be associated with the environment
observation and action channels. The function assigns, in sequential order, each
environment observation channel specified in critic
= rlQValueFunction(net
,observationInfo
,actionInfo
,ObservationInputNames=netObsNames
,ActionInputNames=netActName
)observationInfo
to the
layer specified by the corresponding name in the string array
netObsNames
, and the environment action channel specified in
actionInfo
to the layer specified by the string
netActName
. Therefore, the network input layers, ordered as the
names in netObsNames
, must have the same data type and dimensions as
the observation specifications, as ordered in observationInfo
.
Furthermore, the network input layer indicated by netActName
must
have the same data type and dimensions as the action specifications defined in
actionInfo
.
creates the Q-value function object critic
= rlQValueFunction(tab
,observationInfo
,actionInfo
)critic
with discrete
action and observation spaces from the Q-value table
tab
. tab
is a rlTable
object
containing a table with as many rows as the possible observations and as many columns as
the possible actions. The function sets the ObservationInfo
and ActionInfo
properties of critic
respectively to the
observationInfo
and actionInfo
input
arguments, which in this case must be scalar rlFiniteSetSpec
objects.
creates a Q-value function object critic
= rlQValueFunction({basisFcn
,W0
},observationInfo
,actionInfo
)critic
using a custom basis
function as underlying approximator. The first input argument is a two-element cell array
whose first element is the handle basisFcn
to a custom basis
function and whose second element is the initial weight vector W0
.
Here the basis function must have both observation and action as inputs and
W0
must be a column vector. The function sets the
ObservationInfo
and ActionInfo
properties of
critic
to the observationInfo
and
actionInfo
input arguments, respectively.
specifies the device used to perform computational operations on the
critic
= rlQValueFunction(___,UseDevice=useDevice
)critic
object, and sets the UseDevice
property
of critic
to the useDevice
input argument. You
can use this syntax with any of the previous input-argument combinations.
Input Arguments
Properties
Object Functions
rlDDPGAgent | Deep deterministic policy gradient reinforcement learning agent |
rlTD3Agent | Twin-delayed deep deterministic policy gradient reinforcement learning agent |
rlDQNAgent | Deep Q-network reinforcement learning agent |
rlQAgent | Q-learning reinforcement learning agent |
rlSARSAAgent | SARSA reinforcement learning agent |
rlSACAgent | Soft actor-critic reinforcement learning agent |
getValue | Obtain estimated value from a critic given environment observations and actions |
getMaxQValue | Obtain maximum estimated value over all possible actions from a Q-value function critic with discrete action space, given environment observations |