Main Content

quantize

Quantize deep neural network

Since R2022a

Description

quantizedNetwork = quantize(quantObj) quantizes a deep neural network using a calibrated dlquantizer object, quantObj. The quantized neural network object, quantizedNetwork, enables visibility of the quantized layers, weights, and biases of the network, as well as simulatable quantized inference behavior.

example

quantizedNetwork = quantize(quantObj,Name,Value) specifies additional options using one or more name name-value arguments.

This function requires Deep Learning Toolbox Model Quantization Library. To learn about the products required to quantize a deep neural network, see Quantization Workflow Prerequisites.

example

Examples

collapse all

This example shows how to create a target agnostic, simulatable quantized deep neural network in MATLAB.

Target agnostic quantization allows you to see the effect quantization has on your neural network without target hardware or target-specific quantization schemes. Creating a target agnostic quantized network is useful if you:

  • Do not have access to your target hardware.

  • Want to preview whether or not your network is suitable for quantization.

  • Want to find layers that are sensitive to quantization.

Quantized networks emulate quantized behavior for quantization-compatible layers. Network architecture like layers and connections are the same as the original network, but inference behavior uses limited precision types. Once you have quantized your network, you can use the quantizationDetails function to retrieve details on what was quantized.

Load the pretrained network. net is a SqueezeNet network that has been retrained using transfer learning to classify images in the MerchData data set.

load squeezedlnetmerch
net
net = 
  dlnetwork with properties:

         Layers: [67×1 nnet.cnn.layer.Layer]
    Connections: [74×2 table]
     Learnables: [52×3 table]
          State: [0×3 table]
     InputNames: {'data'}
    OutputNames: {'prob'}
    Initialized: 1

  View summary with summary.

You can use the quantizationDetails function to see that the network is not quantized.

qDetailsOriginal = quantizationDetails(net)
qDetailsOriginal = struct with fields:
            IsQuantized: 0
          TargetLibrary: ""
    QuantizedLayerNames: [0×0 string]
    QuantizedLearnables: [0×3 table]

Unzip and load the MerchData images as an image datastore and extract the classes from the datastore.

unzip('MerchData.zip')
imds = imageDatastore('MerchData', ...
    'IncludeSubfolders',true, ...
    'LabelSource','foldernames');
classes = categories(imds.Labels);

Define calibration and validation data to use for quantization. The output size of the images are changed for both calibration and validation data according to network requirements.

[calData,valData] = splitEachLabel(imds,0.7,'randomized');
augCalData = augmentedImageDatastore([227 227],calData);
augValData = augmentedImageDatastore([227 227],valData);

Create dlquantizer object and specify the network to quantize. Set the execution environment to MATLAB. How the network is quantized depends on the execution environment. The MATLAB execution environment is agnostic to the target hardware and allows you to prototype quantized behavior. When you use the MATLAB execution environment, quantization is performed using the fi fixed-point data type which requires a Fixed-Point Designer™ license.

quantObj = dlquantizer(net,'ExecutionEnvironment','MATLAB');

Use the calibrate function to exercise the network with sample inputs and collect range information. The calibrate function exercises the network and collects the dynamic ranges of the weights and biases in the convolution and fully connected layers of the network and the dynamic ranges of the activations in all layers of the network. The function returns a table. Each row of the table contains range information for a learnable parameter of the optimized network.

calResults = calibrate(quantObj,augCalData);

Use the quantize method to quantize the network object and return a simulatable quantized network.

qNet = quantize(quantObj)  
qNet = 
  Quantized dlnetwork with properties:

         Layers: [67×1 nnet.cnn.layer.Layer]
    Connections: [74×2 table]
     Learnables: [52×3 table]
          State: [0×3 table]
     InputNames: {'data'}
    OutputNames: {'prob'}
    Initialized: 1

  View summary with summary.
  Use the quantizationDetails function to extract quantization details.

You can use the quantizationDetails function to see that the network is now quantized.

qDetailsQuantized = quantizationDetails(qNet)
qDetailsQuantized = struct with fields:
            IsQuantized: 1
          TargetLibrary: "none"
    QuantizedLayerNames: [53×1 string]
    QuantizedLearnables: [52×3 table]

Make predictions using the original, single-precision floating-point network, and the quantized INT8 network.

origScores = minibatchpredict(net,augValData);
predOriginal = scores2label(origScores,classes);    % Predictions for the non-quantized network

qScores = minibatchpredict(qNet,augValData);
predQuantized = scores2label(qScores,classes);     % Predictions for the quantized network 

Compute the relative accuracy of the quantized network as compared to the original network.

ccrQuantized = mean(squeeze(predQuantized) == valData.Labels)*100
ccrQuantized = 100
ccrOriginal = mean(squeeze(predOriginal) == valData.Labels)*100
ccrOriginal = 100

For this validation data set, the quantized network gives the same predictions as the floating-point network.

This example shows how to emulate the behavior of a quantized network for GPU deployment. Once you quantize your network for a GPU execution environment, you can emulate the GPU target behavior without the GPU hardware. Doing so allows you to examine your quantized network structure and behavior without generating code for deployment.

Emulated quantized networks are not smaller than the original network.

Load the pretrained network. net is a SqueezeNet convolutional neural network that has been retrained using transfer learning to classify images in the MerchData data set.

load squeezedlnetmerch
net
net = 
  dlnetwork with properties:

         Layers: [67×1 nnet.cnn.layer.Layer]
    Connections: [74×2 table]
     Learnables: [52×3 table]
          State: [0×3 table]
     InputNames: {'data'}
    OutputNames: {'prob'}
    Initialized: 1

  View summary with summary.

Define calibration and validation data to use for quantization.

Use the calibration data to collect the dynamic ranges of the weights and biases in the convolution and fully connected layers of the network and the dynamic ranges of the activations in all layers of the network. For the best quantization results, the calibration data must be representative of inputs to the network.

Use the validation data to test the network after quantization to understand the effects of the limited range and precision of the quantized convolution layers in the network.

For this example, use the images in the MerchData data set. Split the data into calibration and validation data sets.

unzip("MerchData.zip");
imds = imageDatastore("MerchData", ...
    IncludeSubfolders=true, ...
    LabelSource="foldernames");
classes = categories(imds.Labels);
[calData,valData] = splitEachLabel(imds,0.7,"randomized");

Create a dlquantizer object and specify the network to quantize. How the network is quantized depends on the execution environment. Set ExecutionEnvironment to GPU to perform quantization specific to GPU target hardware.

quantObj = dlquantizer(net,ExecutionEnvironment="GPU");

Use the calibrate function to exercise the network object with sample inputs and collect range information.

calResults = calibrate(quantObj,calData);

Use the quantize method to quantize the network object and return a simulatable quantized network.

qNet = quantize(quantObj)
qNet = 
  Quantized dlnetwork with properties:

         Layers: [67×1 nnet.cnn.layer.Layer]
    Connections: [74×2 table]
     Learnables: [52×3 table]
          State: [0×3 table]
     InputNames: {'data'}
    OutputNames: {'prob'}
    Initialized: 1

  View summary with summary.
  Use the quantizationDetails function to extract quantization details.

You can use the quantizationDetails method to see that the network is now quantized.

qDetails = quantizationDetails(qNet) 
qDetails = struct with fields:
            IsQuantized: 1
          TargetLibrary: "cudnn"
    QuantizedLayerNames: [55×1 string]
    QuantizedLearnables: [35×3 table]

The TargetLibrary field shows that the quantized network emulates the CUDA® Deep Neural Network library (cuDNN).

The QuantizedLayerNames field displays a list of layers that have been quantized.

qDetails.QuantizedLayerNames(1:5)
ans = 5×1 string
    "conv1"
    "relu_conv1"
    "pool1"
    "fire2-squeeze1x1"
    "fire2-relu_squeeze1x1"

The QuantizedLearnables field contains additional details on quantized network learnable parameters. In this example, the 2-D convolutional layer, conv1, has had the weights scaled and cast to int8. The bias is scaled and remains in single precision. The values of quantized learnables are returned as stored integer values.

qDetails.QuantizedLearnables
ans=35×3 table
          Layer           Parameter           Value       
    __________________    _________    ___________________

    "conv1"               "Weights"    {3×3×3×64   int8  }
    "conv1"               "Bias"       {1×1×64     single}
    "fire2-squeeze1x1"    "Weights"    {1×1×64×16  int8  }
    "fire2-squeeze1x1"    "Bias"       {1×1×16     single}
    "fire2-expand1x1"     "Weights"    {1×1×16×64  int8  }
    "fire2-expand3x3"     "Weights"    {3×3×16×64  int8  }
    "fire3-squeeze1x1"    "Weights"    {1×1×128×16 int8  }
    "fire3-squeeze1x1"    "Bias"       {1×1×16     single}
    "fire3-expand1x1"     "Weights"    {1×1×16×64  int8  }
    "fire3-expand3x3"     "Weights"    {3×3×16×64  int8  }
    "fire4-squeeze1x1"    "Weights"    {1×1×128×32 int8  }
    "fire4-squeeze1x1"    "Bias"       {1×1×32     single}
    "fire4-expand1x1"     "Weights"    {1×1×32×128 int8  }
    "fire4-expand3x3"     "Weights"    {3×3×32×128 int8  }
    "fire5-squeeze1x1"    "Weights"    {1×1×256×32 int8  }
    "fire5-squeeze1x1"    "Bias"       {1×1×32     single}
      ⋮

You can use the quantized network to emulate how a network quantized for GPU target hardware would perform a classification task.

Make predictions using the original, single-precision floating-point network. To accelerate the computation by compiling and executing a MEX function on the GPU, use the acceleration option "mex" of the predict function.

XTest = readall(valData);
XTest = cat(4,XTest{:});
XTest = dlarray(gpuArray(single(XTest)),"SSCB");                        
TTest = valData.Labels;

YTestOriginal = predict(net,XTest,Acceleration="mex");
Generating MEX for cudnn target. 
YTestOriginal = onehotdecode(YTestOriginal,classes,3);

Make predictions using the quantized INT8 network. Use the acceleration option "mex" of the predict function. MEX acceleration is supported for quantized networks based on quantization objects with ExecutionEnvironment set to GPU.

YTestQuantized = predict(qNet,XTest,Acceleration="mex");
Generating MEX for cudnn target. 
YTestQuantized = onehotdecode(YTestQuantized,classes,3);

Compute the relative accuracy of the quantized network as compared to the original network.

ccrOriginal = mean(squeeze(YTestOriginal) == valData.Labels)
ccrOriginal = 
1
ccrQuantized = mean(squeeze(YTestQuantized) == valData.Labels)
ccrQuantized = 
1

The quantized network shows no drop in accuracy.

This example shows how to emulate the behavior of a quantized network for FPGA deployment. Once you quantize your network for an FPGA execution environment, you can emulate the FPGA target behavior without any FPGA hardware. This action allows you to examine your quantized network structure and behavior without generating code for deployment.

Load the pretrained network.

if ~isfile("LogoNet.mat")
    url = "https://www.mathworks.com/supportfiles/gpucoder/cnn_models/logo_detection/LogoNet.mat";
    websave("LogoNet.mat",url);
end
data = load("LogoNet.mat");
net  = data.convnet;

Define calibration and validation data to use for quantization.

Use the calibration data to collect the dynamic ranges of the weights and biases in the convolution and fully connected layers, the dynamic ranges of the activations in all the layers, and the dynamic ranges of the parameters for some layers. For the best quantization results, the calibration data must be representative of inputs to the network.

Use the validation data to test the network after quantization. Test the network to determine the effects of the limited range and precision of the quantized layers and layer parameters in the network.

This example uses the images in the logos_dataset data set. Create an imageDatastore object, then split the data into calibration and validation data sets.

unzip("logos_dataset.zip");
imageData = imageDatastore(fullfile(pwd,"logos_dataset"),...
    IncludeSubfolders=true,FileExtensions=".JPG",LabelSource="foldernames");
[calData,valData] = splitEachLabel(imageData,0.7,"randomized");

Create a dlquantizer object and specify the network to quantize. Set the execution environment for the quantized network to FPGA.

quantObj = dlquantizer(net,ExecutionEnvironment="FPGA");

Use the calibrate function to exercise the network with sample inputs and collect range information.

calResults = calibrate(quantObj,calData,UseGPU="off");

Use the quantize function to quantize the network object and return a quantized network for simulation.

qNet = quantize(quantObj)
qNet = 
  Quantized DAGNetwork with properties:

         Layers: [22x1 nnet.cnn.layer.Layer]
    Connections: [21x2 table]
     InputNames: {'imageinput'}
    OutputNames: {'classoutput'}

  Use the quantizationDetails function to extract quantization details.

Use the quantizationDetails method to extract quantization details.

You can use the quantizationDetails function to confirm that the network is now quantized. The TargetLibrary field shows that the quantized network emulates an FPGA target.

qDetails = quantizationDetails(qNet)
qDetails = struct with fields:
            IsQuantized: 1
          TargetLibrary: "fpga"
    QuantizedLayerNames: [17x1 string]
    QuantizedLearnables: [14x3 table]

The QuantizedLayerNames field displays a list of quantized layers.

qDetails.QuantizedLayerNames
ans = 17x1 string
    "conv_1"
    "relu_1"
    "maxpool_1"
    "conv_2"
    "relu_2"
    "maxpool_2"
    "conv_3"
    "relu_3"
    "maxpool_3"
    "conv_4"
    "relu_4"
    "maxpool_4"
    "fc_1"
    "relu_5"
    "fc_2"
    "relu_6"
    "fc_3"

The QuantizedLearnables field contains additional details about the quantized network learnable parameters. In this example, the 2-D convolutional layers and fully connected layers have their weights scaled and cast to int8. The bias is scaled and remains in int32. The quantizationDetails function returns the values of the quantized learnables as stored integer values.

qDetails.QuantizedLearnables
ans=14×3 table
     Layer      Parameter            Value        
    ________    _________    _____________________

    "conv_1"    "Weights"    {5x5x3x96      int8 }
    "conv_1"    "Bias"       {1x1x96        int32}
    "conv_2"    "Weights"    {3x3x96x128    int8 }
    "conv_2"    "Bias"       {1x1x128       int32}
    "conv_3"    "Weights"    {3x3x128x384   int8 }
    "conv_3"    "Bias"       {1x1x384       int32}
    "conv_4"    "Weights"    {3x3x384x128   int8 }
    "conv_4"    "Bias"       {1x1x128       int32}
    "fc_1"      "Weights"    {5x5x128x2048  int8 }
    "fc_1"      "Bias"       {1x1x2048      int32}
    "fc_2"      "Weights"    {1x1x2048x2048 int8 }
    "fc_2"      "Bias"       {1x1x2048      int32}
    "fc_3"      "Weights"    {1x1x2048x32   int8 }
    "fc_3"      "Bias"       {1x1x32        int32}

You can use the quantized network to emulate a network quantized for FPGA target hardware performing a classification task.

ypred = qNet.classify(valData);
ccr = mean(ypred == valData.Labels)
ccr = 1

Input Arguments

collapse all

dlquantizer object containing the network to quantize, calibrated using the calibrate object function. The ExecutionEnvironment must be set to 'GPU' 'FPGA', or 'MATLAB'.

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: quantizedNetwork = quantize(quantObj,'ExponentScheme','Histogram')

Exponent selection scheme, specified as one of these values:

  • 'MinMax' — Evaluate the exponent based on the range information in the calibration statistics and avoid overflows.

  • 'Histogram' — Distribution-based scaling which evaluates the exponent to best fit the calibration data.

Example: 'ExponentScheme','Histogram'

Output Arguments

collapse all

Quantized neural network, returned as a dlnetwork, DAGNetwork, yolov2ObjectDetector (Computer Vision Toolbox), yolov3ObjectDetector (Computer Vision Toolbox), yolov4ObjectDetector (Computer Vision Toolbox), or a ssdObjectDetector (Computer Vision Toolbox) object.

Limitations

  • The quantize function does not support quantization of networks using dlquantizer objects with ExecutionEnvironment set to 'CPU'.

  • Code generation does not support quantized deep neural networks produced by the quantize function.

Version History

Introduced in R2022a

expand all