rlReplayMemory
Replay memory experience buffer
Description
An off-policy reinforcement learning agent stores experiences in an experience
buffer. During training, the agent samples mini-batches of experiences from the buffer and
uses these mini-batches to update its actor and critic function approximators. When you create
a custom off-policy reinforcement learning agent, you can create a circular experience buffer
using an rlReplayMemory
object.
Creation
Description
Input Arguments
obsInfo
— Observation specifications
specification object | array of specification objects
Observation specifications, specified as a reinforcement learning specification object or an array of specification objects defining properties such as dimensions, data type, and names of the observation signals.
You can extract the observation specifications from an existing environment or
agent using getObservationInfo
. You can also construct the specifications manually
using rlFiniteSetSpec
or rlNumericSpec
.
actInfo
— Action specifications
specification object | array of specification objects
Action specifications, specified as a reinforcement learning specification object defining properties such as dimensions, data type, and names of the action signals.
You can extract the action specifications from an existing environment or agent
using getActionInfo
. You can also construct the specification manually using
rlFiniteSetSpec
or rlNumericSpec
.
Properties
MaxLength
— Maximum buffer length
10000
(default) | positive integer
This property is read-only.
Maximum buffer length, specified as a positive integer.
Length
— Number of experiences in buffer
0
(default) | nonnegative integer
This property is read-only.
Number of experiences in buffer, specified as a nonnegative integer.
Object Functions
Examples
Create Experience Buffer
Define observation specifications for the environment. For this example, assume that the environment has a single observation channel with three continuous signals in specified ranges.
obsInfo = rlNumericSpec([3 1],... LowerLimit=0,... UpperLimit=[1;5;10]);
Define action specifications for the environment. For this example, assume that the environment has a single action channel with two continuous signals in specified ranges.
actInfo = rlNumericSpec([2 1],... LowerLimit=0,... UpperLimit=[5;10]);
Create an experience buffer with a maximum length of 20,000.
buffer = rlReplayMemory(obsInfo,actInfo,20000);
Append a single experience to the buffer using a structure. Each experience contains the following elements: current observation, action, next observation, reward, and is-done.
For this example, create an experience with random observation, action, and reward values. Indicate that this experience is not a terminal condition by setting the IsDone
value to 0.
exp.Observation = {obsInfo.UpperLimit.*rand(3,1)}; exp.Action = {actInfo.UpperLimit.*rand(2,1)}; exp.NextObservation = {obsInfo.UpperLimit.*rand(3,1)}; exp.Reward = 10*rand(1); exp.IsDone = 0;
Append the experience to the buffer.
append(buffer,exp);
You can also append a batch of experiences to the experience buffer using a structure array. For this example, append a sequence of 100 random experiences, with the final experience representing a terminal condition.
for i = 1:100 expBatch(i).Observation = {obsInfo.UpperLimit.*rand(3,1)}; expBatch(i).Action = {actInfo.UpperLimit.*rand(2,1)}; expBatch(i).NextObservation = {obsInfo.UpperLimit.*rand(3,1)}; expBatch(i).Reward = 10*rand(1); expBatch(i).IsDone = 0; end expBatch(100).IsDone = 1; append(buffer,expBatch);
After appending experiences to the buffer, you can sample mini-batches of experiences for training your RL agent. For example, randomly sample a batch of 50 experiences from the buffer.
miniBatch = sample(buffer,50);
You can sample a horizon of data from the buffer. For example, sample a horizon of 10 consecutive experiences with a discount factor of 0.95.
horizonSample = sample(buffer,1,... NStepHorizon=10,... DiscountFactor=0.95);
The returned sample includes the following information.
Observation
andAction
are the observation and action from the first experience in the horizon.NextObservation
andIsDone
are the next observation and termination signal from the final experience in the horizon.Reward
is the cumulative reward across the horizon using the specified discount factor.
You can also sample a sequence of consecutive experiences. In this case, the structure fields contain arrays with values for all sampled experiences.
sequenceSample = sample(buffer,1,...
SequenceLength=20);
Create Experience Buffer with Multiple Observation Channels
Define observation specifications for the environment. For this example, assume that the environment has a two observations channel: one channel with two continuous observations and one channel with a three-valued discrete observation.
obsContinuous = rlNumericSpec([2 1],... LowerLimit=0,... UpperLimit=[1;5]); obsDiscrete = rlFiniteSetSpec([1 2 3]); obsInfo = [obsContinuous obsDiscrete];
Define action specifications for the environment. For this example, assume that the environment has a single action channel with one continuous action in a specified range.
actInfo = rlNumericSpec([2 1],... LowerLimit=0,... UpperLimit=[5;10]);
Create an experience buffer with a maximum length of 5,000.
buffer = rlReplayMemory(obsInfo,actInfo,5000);
Append a sequence of 50 random experiences to the buffer.
for i = 1:50 exp(i).Observation = ... {obsInfo(1).UpperLimit.*rand(2,1) randi(3)}; exp(i).Action = {actInfo.UpperLimit.*rand(2,1)}; exp(i).NextObservation = ... {obsInfo(1).UpperLimit.*rand(2,1) randi(3)}; exp(i).Reward = 10*rand(1); exp(i).IsDone = 0; end append(buffer,exp);
After appending experiences to the buffer, you can sample mini-batches of experiences for training your RL agent. For example, randomly sample a batch of 10 experiences from the buffer.
miniBatch = sample(buffer,10);
Version History
MATLAB 명령
다음 MATLAB 명령에 해당하는 링크를 클릭했습니다.
명령을 실행하려면 MATLAB 명령 창에 입력하십시오. 웹 브라우저는 MATLAB 명령을 지원하지 않습니다.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)