Deep Learning Data Formats
Most deep learning networks and functions operate on different dimensions of the input data in different ways.
For example, an LSTM operation iterates over the time dimension of the input data, and a batch normalization operation normalizes over the batch dimension of the input data.
Data can have many different types of layouts:
Data can have different numbers of dimensions. For example, you can represent image and video data as 4-D and 5-D arrays, respectively.
Dimensions of data can represent different things. For example, image data has two spatial dimensions, one channel dimension, and one batch dimension.
Data can have dimensions in multiple permutations. For example, you can represent a batch of sequences as a 3-D array with dimensions corresponding to channels, time steps, and observations. These dimensions can be in any order.
To ensure that the software operates on the correct dimensions, you can provide data layout information in different ways:
Option | Scenario | Usage |
---|---|---|
Provide data with dimensions in a specific permutation | Network with an input layer and the data has the required layout. | Pass data directly to network or function. |
Provide data with labeled dimensions | Network with an input layer and the data does not have the required layout. | |
Deep learning model defined as a function that uses multiple deep learning operations. | ||
Custom layer that uses multiple deep learning operations. | Create layer that inherits from
nnet.layer.Formattable . | |
Provide data with additional layout information | Deep learning functions that require layout information, and you want to preserve the layout of the data. | Specify layout information using the appropriate
input argument. For example, the |
Model functions where dimensions change between functions. For example, when one function must treat the third dimension as time, and a second function must treat the third dimension as spatial. |
To provide input data with labeled dimensions or additional layout information, you can use data formats.
A data format is a string of characters, where each character describes the type of the corresponding data dimension.
The characters are:
"S"
— Spatial"C"
— Channel"B"
— Batch"T"
— Time"U"
— Unspecified
For example, consider an array containing a batch of sequences where the first, second,
and third dimensions correspond to channels, observations, and time steps, respectively. You
can specify that this array has the format "CBT"
(channel, batch,
time).
For dlnetwork
objects with input layers, or when you use the
trainnet
function, if your data already has the layout required by
the network, then the easiest option is usually to provide input data with the dimensions in
the permutation that the network requires. In this case, you can input your data directly
and not specify layout information. The required format depends on the type of input
layer.
Layer | Format |
---|---|
Feature input layer | "BC" |
2-D image input layer | "SSCB" |
3-D image input layer | "SSSCB" |
Sequence input layer | "TCB" (vector sequences) |
"SCBT" (1-D image sequences) | |
"SSCBT" (2-D image sequences) | |
"SSSCBT" (3-D image sequences) |
When your data has a different layout, providing formatted data or data format information
can be easier than reshaping and preprocessing your data. For example, if you have sequence
data, where the first, second, and third dimensions correspond to channels, observations,
and time steps, respectively, then you can specify the string "CBT"
instead of permuting and preprocessing the data to have the layout required by the
software.
To create formatted input data, create a dlarray
object
and specify the format using the fmt
argument.
For example, for an array X
that represents a batch of sequences, where
the first, second, and third dimension correspond to channels, observations, and time steps,
respectively,
specify:
X = dlarray(X,"CBT");
Note
When you create a formatted dlarray
object, the software automatically
permutes the dimensions such that the format has dimensions in this order:
"S"
"C"
"B"
"T"
"U"
For example, if you specify a format of "TCB"
(time, channel, batch),
then the software automatically permutes the dimensions so that it has format
"CBT"
(channel, batch, time).
To provide additional layout information with unformatted data to deep learning
operations, specify the formats using the appropriate input argument of the function. For
example, to apply the dlconv
operation to an unformatted
dlarray
object X
, that represents a batch of images,
where the first two dimensions correspond to the spatial dimensions and the third and forth
dimensions correspond to the channel and batch dimensions, respectively,
specify:
Y = dlconv(X,weights,bias,DataFormat="SSCB");
To view the layout information of dlarray
objects, use the dims
function.
To view the layout information of layer outputs, use the analyzeNetwork
function.
See Also
dlarray
| dims
| stripdims
| analyzeNetwork