padsequences
Syntax
Description
pads the sequences in the cell array XPad
= padsequences(X
,paddingDim
)X
along the dimension specified by
paddingDim
. The function adds padding at the end of each sequence to
match the size of the longest sequence in X
. The padded sequences are
concatenated and the function returns XPad
as an array.
[
additionally returns a logical array representing the positions of original sequence data in
XPad
,mask
] = padsequences(X
,paddingDim
)XPad
. The position of values of true
or
1
in mask
correspond to the positions of original
sequence data in XPad
; values of false
or
0
correspond to padded values.
[___] = padsequences(
specifies options using one or more name-value arguments in addition to the input and output
arguments in previous syntaxes. For example, X
,paddingDim
,Name,Value
)'PaddingValue','left'
adds
padding to the beginning of the original sequence.
Examples
Pad Sequence Data to Same Length
Pad sequence data ready for training.
Load the sequence data and view the sizes of the first few sequences. The sequences have different lengths.
load WaveformData
data(1:5)
ans=5×1 cell array
{103x3 double}
{136x3 double}
{140x3 double}
{124x3 double}
{127x3 double}
Pad the data with zeros to the same length as the longest sequence. The function applies on the right side of the data. Specify the dimension containing the time steps as the padding dimension. For this example, the dimension is 1
.
dataPadded = padsequences(data,1);
Examine the size of the padded sequences.
size(dataPadded)
ans = 1×3
200 3 1000
Pad or Truncate Both Sides of Sequence Data
Use padsequences
to extend or cut each sequence to a fixed length by adding or removing data at both ends of the sequence, depending on the length of the original sequence.
Load the sequence data.
load WaveformData
View the sizes of the first few sequences. The sequences have different lengths.
data(1:10)
ans=10×1 cell array
{103x3 double}
{136x3 double}
{140x3 double}
{124x3 double}
{127x3 double}
{200x3 double}
{141x3 double}
{151x3 double}
{149x3 double}
{112x3 double}
Process the data so that each sequence is exactly 128 time steps. For shorter sequences, padding is required, while longer sequences need to be truncated. Pad or truncate at both sides of the data. For the padded sequences, apply symmetric padding so that the padded values are mirror reflections of the original sequence values.
[dataPadded,mask] = padsequences(data,1,'Length',128,'Direction','both','PaddingValue','symmetric');
Compare some of the padded sequences with the original sequence. Each observation contains 12 features so extract a single feature to compare.
View the size of the first observation. This sequence is shorter than 128 time steps.
idx = 1; size(data{idx})
ans = 1×2
103 3
View the size of the padded array.
size(dataPadded)
ans = 1×3
128 3 1000
The function centers the sequence and pads at both ends by reflecting the values at the ends of the sequence. The mask shows the location of the original sequence values. View the first and last few time steps of the mask.
mask(1:20,1,idx)
ans = 20x1 logical array
0
0
0
0
0
0
0
0
0
0
⋮
mask(end-19:end,1,idx)
ans = 20x1 logical array
1
1
1
1
1
1
1
0
0
0
⋮
View the size of the third observation. This sequence is longer than 128 time steps.
idx = 3; size(data{idx})
ans = 1×2
140 3
The function centers the sequence and truncates at both ends. The mask shows that all data in the resulting sequence is part of the original sequence. View the first and last few time steps of the mask.
mask(1:20,1,idx)
ans = 20x1 logical array
1
1
1
1
1
1
1
1
1
1
⋮
mask(end-19:end,1,idx)
ans = 20x1 logical array
1
1
1
1
1
1
1
1
1
1
⋮
Pad Mini-Batches of Sequences for Custom Training Loop
Use the padsequences
function in conjunction with minibatchqueue
to prepare and preprocess sequence data ready for training using a custom training loop.
The example uses the human activity recognition training data. The data contains six time series of sensor data obtained from a smartphone worn on the body. Each sequence has three features and varies in length. The three features correspond to the accelerometer readings in three different directions.
Load the training data. Combine the data and labels into a single datastore.
s = load("HumanActivityTrain.mat"); dsXTrain = arrayDatastore(s.XTrain,'OutputType','same'); dsYTrain = arrayDatastore(s.YTrain,'OutputType','same'); dsTrain = combine(dsXTrain,dsYTrain);
Use minibatchqueue
to process the mini-batches of sequence data. Define a custom mini-batch preprocessing function preprocessMiniBatch
(defined at the end of this example) to pad the sequence data and labels, and one-hot encode the label sequences. To also return the mask of the padded data, specify three output variables for the minibatchqueue
object.
miniBatchSize = 2; mbq = minibatchqueue(dsTrain,3,... 'MiniBatchSize',miniBatchSize,... 'MiniBatchFcn', @preprocessMiniBatch);
Check the size of the mini-batches.
[X,Y,mask] = next(mbq); size(X)
ans = 1×3
3 64480 2
size(mask)
ans = 1×3
3 64480 2
Each mini-batch has two observations. The function pads the sequences to the same size as the longest sequence in the mini-batch. The mask is the same size as the padded sequences, and shows the location of the original data values in the padded sequence data.
size(Y)
ans = 1×3
5 64480 2
The padded labels are one-hot encoded into numeric data ready for training.
function [xPad,yPad,mask] = preprocessMiniBatch(X,Y) [xPad,mask] = padsequences(X,2); yPad = padsequences(Y,2); yPad = onehotencode(yPad,1); end
Input Arguments
X
— Sequences to pad
cell vector
Sequences to pad, specified as a cell vector of numeric or categorical arrays.
Data Types: cell
paddingDim
— Dimension along which to pad
positive integer
Dimension along which to pad input sequence data, specified as a positive integer.
Example: 2
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name
in quotes.
Example: padsequences(X,'Length','shortest','Direction','both')
truncates the sequences at each end, to match the length of the shortest input
sequence.
Length
— Length of padded sequences
'longest'
(default) | 'shortest'
| positive integer
Length of padded sequences, specified as one of the following:
'longest'
— Pad each input sequence to the same length as the longest input sequence.'shortest'
— Truncate each input sequence to the same length as the shortest input sequence.Positive integer — Pad or truncate each input sequence to the specified length.
Example: padsequences(X,'Length','shortest')
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
| char
| string
Direction
— Direction of padding or truncation
'right'
(default) | 'left'
| 'both'
Direction of padding or truncation, specified as one of the following:
'right'
— Pad or truncate at the end of each original sequence.'left'
— Pad or truncate at the beginning of each original sequence.'both'
— Pad or truncate at the beginning and end of each original sequence. Half the required padding or truncation is applied to each end of the sequence.
Example: padsequences(X,'Direction','both')
Data Types: char
| string
PaddingValue
— Value used to pad input
'auto'
(default) | 'symmetric'
| numeric scalar | categorical scalar
Value used to pad input, specified as one of the following:
'auto'
— Determine the padding value automatically depending on the data type of the input sequences. Numeric sequences are padded with0
. Categorical sequences are padded with<undefined>
.'symmetric'
— Pad each sequence with a mirror reflection of itself.Numeric scalar — Pad each sequence with the specified numeric value.
Categorical scalar — Pad each sequence with the specified categorical value.
Example: padsequences(X,'PaddingValue','symmetric')
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
| categorical
UniformOutput
— Flag to return padded data as uniform array
true
or 1
(default) | false
or 0
Flag to return padded data as a uniform array, specified as a numeric or logical
1
(true
) or 0
(false
). When you set the value to 0
,
XPad
is returned as a cell vector with the same size and
underlying data type as the input X
.
Example: padsequences(X,'UniformOutput',0)
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
| logical
Output Arguments
XPad
— Padded sequence data
numeric array | categorical array | cell vector
Padded sequence data, returned as a numeric array, categorical array, or a cell vector of numeric or categorical arrays.
If you set the UniformOutput
name-value option to
true
or 1
, the function concatenates the padded
sequences over the last dimension. The last dimension of XPad
has
the same size as the number of sequences in input X
.
XPad
is an array with N
+ 1
dimensions, where N
is the number of dimensions of the sequence
arrays in X
. XPad
has the same data type as
the arrays in input X
.
If you set the UniformOutput
name-value option to
false
or 0
, the function returns the padded
sequences as a cell vector with the same size and underlying data type as the input
X
.
mask
— Position of original sequence data
logical array | cell vector
Position of original sequence data in the padded sequences, returned as a logical array or as a cell vector of logical arrays.
mask
has the same size and data type as
XPad
. Values of 1
in mask
correspond to positions of original sequence values in XPad
. Values
of 0
correspond to padded values.
Use mask
to excluded padded values from loss calculations using
the "Mask"
name-value option in the crossentropy
function.
Version History
Introduced in R2021a
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)