Main Content


SARSA reinforcement learning agent

Since R2019a


The SARSA algorithm is a model-free, online, on-policy, discrete action-space reinforcement learning method. A SARSA agent is a value-based reinforcement learning agent which trains a critic to estimate the expected discounted cumulative long-term reward.


SARSA agents do not support recurrent networks.

For more information on SARSA agents, see SARSA Agents.

For more information on the different types of reinforcement learning agents, see Reinforcement Learning Agents.




agent = rlSARSAAgent(critic,agentOptions) creates a SARSA agent with the specified critic network and sets the AgentOptions property.

Input Arguments

expand all

Critic, specified as an rlQValueFunction object. For more information on creating critics, see Create Policies and Value Functions.


expand all

Agent options, specified as an rlSARSAAgentOptions object.

Option to use exploration policy when selecting actions during simulation or after deployment, specified as a one of the following logical values.

  • true — Use the base agent exploration policy when selecting actions in sim and generatePolicyFunction. Specifically, in this case the agent uses the rlEpsilonGreedyPolicy. Since the action selection has a random component, the agent explores its action and observation spaces.

  • false — Force the agent to use the base agent greedy policy (the action with maximum likelihood) when selecting actions in sim and generatePolicyFunction. Specifically, in this case the agent uses the rlMaxQPolicy policy. Since the action selection is greedy the policy behaves deterministically and the agent does not explore its action and observation spaces.


This option affects only simulation and deployment; it does not affect training. When you train an agent using train, the agent always uses its exploration policy independently of the value of this property.

This property is read-only.

Observation specifications, specified as an rlFiniteSetSpec or rlNumericSpec object or an array containing a mix of such objects. Each element in the array defines the properties of an environment observation channel, such as its dimensions, data type, and name.

The value of ObservationInfo matches the corresponding value specified in critic.

You can extract observationInfo from an existing environment or agent using getObservationInfo. You can also construct the specifications manually using rlFiniteSetSpec or rlNumericSpec.

Action specifications, specified as an rlFiniteSetSpec object. This object defines the properties of the environment action channel, such as its dimensions, data type, and name.


Only one action channel is allowed.

If you create the agent by specifying a critic object, the value of ActionInfo matches the value specified in critic.

You can extract actionInfo from an existing environment or agent using getActionInfo. You can also construct the specification manually using rlFiniteSetSpec.

Sample time of agent, specified as a positive scalar or as -1. Setting this parameter to -1 allows for event-based simulations.

Within a Simulink® environment, the RL Agent block in which the agent is specified to execute every SampleTime seconds of simulation time. If SampleTime is -1, the block inherits the sample time from its parent subsystem.

Within a MATLAB® environment, the agent is executed every time the environment advances. In this case, SampleTime is the time interval between consecutive elements in the output experience returned by sim or train. If SampleTime is -1, the time interval between consecutive elements in the returned output experience reflects the timing of the event that triggers the agent execution.

Example: SampleTime=-1

Object Functions

trainTrain reinforcement learning agents within a specified environment
simSimulate trained reinforcement learning agents within specified environment
getActionObtain action from agent, actor, or policy object given environment observations
getActorExtract actor from reinforcement learning agent
setActorSet actor of reinforcement learning agent
getCriticExtract critic from reinforcement learning agent
setCriticSet critic of reinforcement learning agent
generatePolicyFunctionGenerate MATLAB function that evaluates policy of an agent or policy object


collapse all

Create or load an environment interface. For this example load the Basic Grid World environment interface also used in the example Train Reinforcement Learning Agent in Basic Grid World.

env = rlPredefinedEnv("BasicGridWorld");

Get observation and action specifications.

obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);

A SARSA agent uses a parametrized Q-value function to estimate the value of the policy. A Q-value function takes the current observation and an action as inputs and returns a single scalar as output (the estimated discounted cumulative long-term reward for taking the action from the state corresponding to the current observation, and following the policy thereafter).

Since both observation and action spaces are discrete and low-dimensional, use a table to model the Q-value function within the critic. rlTable creates a value table object from the observation and action specifications objects.

Create a table approximation model derived from the environment observation and action specifications.

qTable = rlTable(obsInfo,actInfo);

Create the Q-value function approximator object using qTable and the environment specification objects. For more information, see rlQValueFunction.

critic = rlQValueFunction(qTable,obsInfo,actInfo);

Create a SARSA agent using the approximator object.

agent = rlSARSAAgent(critic)
agent = 
  rlSARSAAgent with properties:

            AgentOptions: [1x1 rl.option.rlSARSAAgentOptions]
    UseExplorationPolicy: 0
         ObservationInfo: [1x1 rl.util.rlFiniteSetSpec]
              ActionInfo: [1x1 rl.util.rlFiniteSetSpec]
              SampleTime: 1

Specify an Epsilon value of 0.05.

opt = rlSARSAAgentOptions;
agent.AgentOptions.EpsilonGreedyExploration.Epsilon = 0.05;

To check your agent, use getAction to return the action from a random observation.

act = getAction(agent,{randi(numel(obsInfo.Elements))});
ans = 1

You can now test and train the agent against the environment.

Version History

Introduced in R2019a