Main Content

Code Generation for Quantized Deep Learning Network on Cortex-M Target

Deep learning uses neural network architectures that contain many processing layers. Deep learning models typically work on large sets of labeled data. Performing inference on these models is computationally intensive, consuming significant amount of memory. Neural networks use memory to store input data, parameters (weights), and activations from each layer as the input propagates through the network. Deep neural networks trained in MATLAB use single-precision floating point data types. Even networks that are small in size require a considerable amount of memory and hardware to perform these floating-point arithmetic operations. These restrictions can inhibit deployment of deep learning models to devices that have low computational power and smaller memory resources. By using a lower precision to store the weights and activations, memory requirements of the network can be reduced.

This example shows how to quantize and generate C static library for a pretrained deep learning network and deploy the code on a Cortex-M processor. The generated code reduces memory consumption by performing inference computations in 8-bit integers for the fully conntected layer and takes advantage of ARM® processor SIMD by using the CMSIS-NN library. In this example, an LSTM network predicts human activity based on time series data representing accelerometer readings in three different directions.

In this example, you generate a PIL MEX function. When you run the PIL MEX within the MATLAB environment on your host computer, PIL interface in turn executes the generated executable on the target hardware.


  • This example uses a pretrained LSTM network. For more information on how to train an LSTM network, see Sequence Classification Using Deep Learning (Deep Learning Toolbox).

  • Reduction in memory consumption and performance improvement might depend on the specific network you choose to deploy.

  • This example is supported on Windows® platform only.

Third-Party Prerequisites

  • Cortex-M hardware - STM32F746G Discovery board

  • CMSIS-NN Library

Quantize the Network

Load the pretrained network attached as a MAT-file. Create a dlquantizer (Deep Learning Toolbox) object and specify the network to quantize.

dq = dlquantizer(net, 'ExecutionEnvironment', 'CPU');

Use the calibrate (Deep Learning Toolbox) function to exercise the network with sample inputs and collect range information. In the training data you pass to the calibrate function, all sequences must have the same length. The calibrate function exercises the network and collects the dynamic ranges of the weights and biases in the LSTM and fully connected layers of the network and the dynamic ranges of the activations in all layers of the network. Save the dlquantizer object as a MAT-file to pass it to the codegen function.

xDs = arrayDatastore(cellfun(@single,XTrain,'UniformOutput',false),"IterationDimension",1,"OutputType","same");
tDs = arrayDatastore(YTrain,"IterationDimension",1,"OutputType","same");
data = combine(xDs,tDs);
save('activityRecognisationQuantObj.mat', 'dq')

Generate PIL MEX Function

In this example, you generate code for the entry-point function net_predict.m. This function uses the coder.loadDeepLearningNetwork function to load a deep learning model and to construct and set up a RNN network. Then the entry-point function predicts the responses by using the predict (Deep Learning Toolbox) function.

type net_predict.m
% Copyright 2021 The MathWorks, Inc.

function out = net_predict(netFile, in)
net = coder.loadDeepLearningNetwork(netFile);
out = net.predict(in);

To generate a PIL MEX function, create a code configuration object for a static library and set the verification mode to 'PIL'. Set the target language to C. Limit the stack size to reasonable size, for example 512 bytes, as the default size is much larger than the memory available on the hardware board.

cfg = coder.config('lib', 'ecoder', true);
cfg.VerificationMode = 'PIL';
cfg.StackUsageMax = 512;
cfg.TargetLang = 'C';

Create a deep learning configuration object for the CMSIS-NN library.

dlcfg = coder.DeepLearningConfig('cmsis-nn');

Attach the saved dlquantizer object MAT-file to dlcfg to generate code that performs low precision (int8) inference.

dlcfg.CalibrationResultFile = 'activityRecognisationQuantObj.mat'; 

Set the DeepLearningConfig property of cfg to dlcfg.

cfg.DeepLearningConfig = dlcfg;

Create a coder.Hardware object for the STM32F746-Discovery board and attach it to the code generation configuration object. In the following code, replace comport with port to which Cortex-M hardware is connected. Also, on the Cortex-M hardware board, set the CMSISNN_PATH environment variable to the location of the CMSIS-NN library build on the Cortex-M board. For more information on building library and setting environmnet variables, see Prerequisites for Deep Learning with MATLAB Coder.

hwName = 'STM32F746G-Discovery';
hw = coder.hardware(hwName);
hw.PILInterface = 'Serial';
% Uncomment the below line and replace comport with the actual port number
% hw.PILCOMPort = comport; 
cfg.Hardware = hw;
cfg.BuildConfiguration = 'Faster Builds';

In the above code, replace comport with the actual port number. Generate a PIL MEX function by using the codegen command.

codegen -config cfg net_predict -args {coder.Constant('activityRecognisationNet.mat'),single(zeros(3,10))}

Run Generated PIL MEX Function

Load test data from HumanActivityTest.mat.

inputData = single(XTest{1}(1:3,1:10));

Run the generated MEX function net_predict_pil on a test data set.

YPred = net_predict_pil('activityRecognisationNet.mat', inputData);