## Code Generation for Quantized Deep Learning Networks

Deep learning uses neural network architectures that contain many processing layers,
including convolutional layers. Deep learning models typically work on large sets of labeled
data. Performing inference on these models is computationally intensive, consuming significant
amount of memory. Neural networks use memory to store input data, parameters (weights), and
activations from each layer as the input propagates through the network. Deep Neural networks
trained in MATLAB^{®} use single-precision floating point data types. Even networks that are small in
size require a considerable amount of memory and hardware to perform these floating-point
arithmetic operations. These restrictions can inhibit deployment of deep learning models to
devices that have low computational power and smaller memory resources. By using a lower
precision to store the weights and activations, you can reduce the memory requirements of the
network.

You can use Deep Learning Toolbox™ in tandem with the Deep Learning Toolbox Model Quantization Library support package to reduce the memory footprint of a deep neural network by quantizing the weights, biases, and activations of convolution layers to 8-bit scaled integer data types. Then, you can use MATLAB Coder™ to generate optimized code for the quantized network.

### ARM Cortex-A Processors

The generated code takes advantage of ARM^{®} processor SIMD by using the ARM Compute library. The generated code can be integrated into your project as
source code, static or dynamic libraries, or executables that you can deploy to a variety of
ARM Cortex-A CPU platforms such as Raspberry Pi™. To deploy quantized networks on ARM Cortex-A processors, you must use ARM Compute library version 20.02.1.

#### Supported Layers and Classes

You can generate C++ code for these layers that uses the ARM Compute Library and performs inference computations in 8-bit integers:

2-D average pooling layer (

`averagePooling2dLayer`

(Deep Learning Toolbox))2-D convolution layer (

`convolution2dLayer`

(Deep Learning Toolbox))Fully connected layer (

`fullyConnectedLayer`

(Deep Learning Toolbox))2-D grouped convolution layer (

`groupedConvolution2dLayer`

(Deep Learning Toolbox)). The value of the`NumGroups`

input argument must be equal to`2`

.Max pooling layer (

`maxPooling2dLayer`

(Deep Learning Toolbox))Rectified Linear Unit (ReLU) layer (

`reluLayer`

(Deep Learning Toolbox))Input and output layers

C++ code generation for such quantized deep learning networks supports `DAGNetwork`

(Deep Learning Toolbox) and
`SeriesNetwork`

(Deep Learning Toolbox)
objects.

#### Generating Code

To generate code that performs inference computations in 8-bit integers, in your
`coder.ARMNEONConfig`

object `dlcfg`

, set these additional
properties:

dlcfg.CalibrationResultFile = 'dlquantizerObjectMatFile'; dlcfg.DataType = 'int8';

Alternatively, in the MATLAB
Coder app, on the **Deep Learning** tab, set **Target
library** to `ARM Compute`

. Then set the
**Data type** and **Calibration result file path**
parameters.

Here `'dlquantizerObjectMatFile'`

is the name of the MAT-file that
`dlquantizer`

(Deep Learning Toolbox)
generates for specific calibration data. For the purpose of calibration, set the
`ExecutionEnvironment`

property of the `dlquantizer`

object to `'CPU'`

.

Otherwise, follow the steps described in Code Generation for Deep Learning Networks with ARM Compute Library.

For an example, see Code Generation for Quantized Deep Learning Network on Raspberry Pi.

### ARM Cortex-M Processors

The generated code takes advantage of the CMSIS-NN library version 5.7.0 and can be integrated into your project as a static library that you can deploy to a variety of ARM Cortex-M CPU platforms.

#### Supported Layers and Classes

The code generated for the `fullyConnectedLayer`

(Deep Learning Toolbox) object, which represents a fully connected layer, uses the
CMSIS-NN library and performs inference computations in 8-bit integers.

Your deep learning network can also contain the following layers. The generated code performs computations for these layers in 32-bit floating point type.

`lstmLayer`

(Deep Learning Toolbox) object, which represents a long short-term memory layer. The value of`SequenceLength`

that you pass to`predict`

must be a compile-time constant.`softmaxLayer`

(Deep Learning Toolbox) object, which represents a softmax layer.Input and output layers.

C code generation for such quantized deep learning networks supports `SeriesNetwork`

(Deep Learning Toolbox)
objects and `DAGNetwork`

(Deep Learning Toolbox)
objects that can be converted to `SeriesNetwork`

objects.

#### Generating Code

To generate code that performs inference computations in 8-bit integers by using the
CMSIS-NN library, in your `coder.CMSISNNConfig`

object `dlcfg`

, set the
`CalibrationResultFile`

property:

`dlcfg.CalibrationResultFile = 'dlquantizerObjectMatFile';`

Alternatively, in the MATLAB
Coder app, on the **Deep Learning** tab, set **Target
library** to `CMSIS-NN`

. Then set the
**Calibration result file path** parameter.

Here `'dlquantizerObjectMatFile'`

is the name of the MAT-file that
`dlquantizer`

(Deep Learning Toolbox)
generates for specific calibration data. For the purpose of calibration, set the
`ExecutionEnvironment`

property of the `dlquantizer`

object to `'CPU'`

.

For an example, see Code Generation for Quantized Deep Learning Network on Cortex-M Target.

## See Also

### Apps

- Deep Network Quantizer (Deep Learning Toolbox)

### Functions

`dlquantizer`

(Deep Learning Toolbox) |`dlquantizationOptions`

(Deep Learning Toolbox) |`calibrate`

(Deep Learning Toolbox) |`validate`

(Deep Learning Toolbox) |`coder.loadDeepLearningNetwork`

|`codegen`