trainWithEvolutionStrategy
Train DDPG, TD3 or SAC agent using an evolutionary strategy within a specified environment
Since R2023b
Description
trains trainStats
= trainWithEvolutionStrategy(agent
,env
,estOpts
)agent
within the environment env
, using the
evolution strategy training options object trainOpts
. Note that
agent
is an handle object and it is updated during training, despite
being an input argument. For more information on the training algorithm, see Train agent with evolution strategy.
Examples
Train Agent Using Evolutionary Strategy
This example shows how to train a DDPG agent using an evolutionary strategy.
Load the predefined environment object representing a cart-pole system with a continuous action space. For more information on this environment, see Load Predefined Control System Environments.
env = rlPredefinedEnv("CartPole-Continuous");
The agent networks are initialized randomly. Ensure reproducibility by fixing the seed of the random generator.
rng(0)
Create a DDPG agent with default networks.
agent = rlDDPGAgent(getObservationInfo(env),getActionInfo(env));
To create an evolution strategy options object, use rlEvolutionStrategyTrainingOptions
.
evsTrainingOpts = rlEvolutionStrategyTrainingOptions(... PopulationSize=10 , ... ReturnedPolicy="BestPolicy" , ... StopTrainingCriteria="AverageReward" , ... StopTrainingValue=496);
To train the agent, use trainWithEvolutionStrategy
.
doTraining = false; if doTraining trainStats = trainWithEvolutionStrategy(agent,env,evsTrainingOpts); else load("rlTrainUsingESAgent.mat","agent"); end
Simulate the agent and display the episode reward.
simOptions = rlSimulationOptions(MaxSteps=500); experience = sim(env,agent,simOptions);
totalReward = sum(experience.Reward)
totalReward = 497.8374
The agent is able to balance the cart-pole system for the whole episode.
Input Arguments
agent
— DDPG, TD3 or SAC agent
rlDDPGAgent
object | rlTD3Agent
object | rlSACAgent
object
Agent to train, specified as an rlDDPGAgent
,
rlTD3Agent
, or
rlSACAgent
object.
Note
trainWithEvolutionStrategy
updates the agent as training
progresses. For more information on how to preserve the original agent, how to save an
agent during training, and on the state of agent
after training, see
the notes and the tips section in train
. For
more information about handle objects, see Handle Object Behavior.
For more information about how to create and configure agents for reinforcement learning, see Reinforcement Learning Agents.
env
— Environment
reinforcement learning environment object
Environment in which the agent acts, specified as one of the following kinds of reinforcement learning environment object:
A predefined MATLAB® or Simulink® environment created using
rlPredefinedEnv
.A custom MATLAB environment you create with functions such as
rlFunctionEnv
orrlCreateEnvTemplate
.A custom Simulink environment you create using
rlSimulinkEnv
.
Note
Multiagent environments do not support training agents with an evolution strategy.
For more information about creating and configuring environments, see:
When env
is a Simulink environment, calling trainWithEvolutionStrategy
compiles and simulates the model associated with the environment.
estOpts
— Parameters and options for training using an evolution strategy
rlEvolutionStrategyTrainingOptions
object
Parameters and options for training using an evolution strategy, specified as an
rlEvolutionStrategyTrainingOptions
object. Use this argument to specify
parameters and options such as:
Population size
Population update method
Number training epochs
Criteria for saving candidate agents
How to display training progress
For details, see rlEvolutionStrategyTrainingOptions
.
Output Arguments
trainStats
— Evolution strategy training results
rlEvolutionStrategyTrainingResult
object
Evolution strategy training results, returned as an
rlEvolutionStrategyTrainingResult
object. The following properties
pertain to the rlEvolutionStrategyTrainingResult
object:
GenerationIndex
— Generation number
[1;2;…;N]
Generation number, returned as the column vector [1;2;…;N]
,
where N
is the number of generations in the training run. This
vector is useful if you want to plot the evolution of other quantities from
generation to generation.
GenerationReward
— Reward for each generation
column vector
Reward for each generation, returned in a column vector of length
N
. Each entry contains the reward for the corresponding
generation.
AverageReward
— Average reward over the averaging window
column vector
Average reward over the averaging window specified in
trainOpts
, returned as a column vector of length
N
. Each entry contains the average award computed at the end
of the corresponding generation.
Q0
— Critic estimate of expected discounted cumulative long-term reward at the beginning of each generation
column vector
Critic estimate of expected discounted cumulative long-term reward using the
current agent and the environment initial conditions, returned as a column vector
of length N
. Each entry is the critic estimate
(Q0) for the agent at the beginning of
the corresponding episode.
SimulationInfo
— Environment simulation information
EvolutionStrategySimulationStorage
object | []
Environment simulation information, returned as:
An
EvolutionStrategySimulationStorage
object, ifSimulationStorageType
is set to"memory"
or"file"
.An empty array, if
SimulationStorageType
is set to"none"
.
An EvolutionStrategySimulationStorage
object contains
information collected during simulation, which you can access by indexing into the
object using the specific number of generation, citizen, and episode for the
individual.
For example, if res
is an
rlEvolutionStrategyTrainingResult
object returned by
trainWithEvolutionStrategy
, you can access the environment
simulation information related to the fourth run (episode), of the third citizen,
in the second generation as:
mySimInfo234 = res.SimulationInfo(2,3,4)
For MATLAB environments,
mySimInfo234
is a structure containing the fieldSimulationError
. This structure contains any errors that occurred during simulation for the fourth episode, of the third citizen, in the second generation.For Simulink environments,
mySimInfo234
is aSimulink.SimulationOutput
object containing simulation data. Recorded data includes any signals and states that the model is configured to log, simulation metadata, and any errors that occurred for the second generation, third citizen, and fourth run.
In both cases, mySimInfo234
also contains a
StatusMessage
field or property indicating that the
corresponding run (episode) has terminated successfully.
An EvolutionStrategySimulationStorage
object also has the
following read-only properties:
NumSimulations
— Total number of simulation
positive integer
Total number of simulations ran in the entire training, returned as a
positive integer. It is equal to the number of generations multiplied by the
population size, multiplied by the number of simulation episodes per
individual. These three numbers correspond to the
MaxGenerations
, PopulationSize
,
and EvaluationsPerIndividual
properties of rlEvolutionStrategyTrainingOptions
, respectively.
Example: 3000
StorageType
— Type of storage for the environment data
"memory"
(default) | "file"
Type of storage for the environment data, returned as either
"memory"
(indicating that data is stored in memory) or
"file"
(indicating that data is stored on disk). For
more information, see the SimulationStorageType
property of rlEvolutionStrategyTrainingOptions
and Address Memory Issues During Training.
Example: "file"
TrainingOptions
— Training options set
rlEvolutionStrategyTrainingOptions
object
Training options set, returned as an rlEvolutionStrategyTrainingOptions
object.
Version History
Introduced in R2023bR2024a: trainWithEvolutionStrategy
now returns an rlEvolutionStrategyTrainingResult
object instead of an
rlTrainingResult
object
Starting in R2024a, trainWithEvolutionStrategy
returns an
rlEvolutionStrategyTrainingResult
object instead of an
rlTrainingResult
object.
The properties of the training result object that have Episode
as a
part of their name have been replaced by respective properties having
Generation
as part of their name instead of
Episode
.
The SimulationInfo
property of an
rlEvolutionStrategyTrainingResult
objects (returned by trainWithEvolutionStrategy
) is now a
EvolutionStrategySimulationStorage
object (unless
SimulationStorageType
is set to "none"
).
Consider a Simulink environment that logs its states as xout
over 10
episodes.
Previously, if res
was an rlTrainingResult
object
returned by trainWithEvolutionStrategy
you could pack the environment
simulation information for all generations in a single array
as:
[res.SimulationInfo.xout]
res.SimulationInfo(5).xout
Similarly, previously you could access the simulation information related to the first generation as:
res.SimulationInfo.xout
res.SimulationInfo(1).xout
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)