Quantize Network for FPGA Deployment
Reduce the memory footprint of a deep neural network by quantizing the weights, biases, and activations of convolution layers to 8-bit scaled integer data types. This example shows how to use Deep Learning Toolbox Model Quantization Library and Deep Learning HDL Toolbox to deploy the int8
network to a target FPGA board.
For this example, you need:
Deep Learning Toolbox ™
Deep Learning HDL Toolbox ™
Deep Learning Toolbox Model Quantization Library
Deep Learning HDL Toolbox Support Package for Xilinx FPGA and SoC Devices
MATLAB Coder Interface for Deep Learning.
Load Pretrained Network
Load the pretrained LogoNet network and analyze the network architecture.
snet = getLogoNetwork; deepNetworkDesigner(snet);
Load Data
This example uses the logos_dataset data set. The data set consists of 320 images. Each image is 227-by-227 in size and has three color channels (RGB). Create an augmentedImageDatastore
object for calibration and validation. Expedite calibration and validation by reducing the calibration data set to 20 images. The MATLAB simulation workflow has a maximum limit of five images when validating the quantized network. Reduce the validation data set sizes to five images. The FPGA validation workflow has a maximum limit of one image when validating the quantized network. Reduce the FPGA validation data set to a single image.
curDir = pwd; unzip("logos_dataset.zip"); imageData = imageDatastore(fullfile(curDir,'logos_dataset'),... 'IncludeSubfolders',true,'FileExtensions','.JPG','LabelSource','foldernames'); [calibrationData, validationData] = splitEachLabel(imageData, 0.5,'randomized'); calibrationData_reduced = calibrationData.subset(1:20); validationData_simulation = validationData.subset(1:5); validationData_FPGA = validationData.subset(1:1);
Generate Calibration Result File for the Network
Create a dlquantizer
object and specify the network to quantize. Specify the execution environment as FPGA.
dlQuantObj_simulation = dlquantizer(snet,'ExecutionEnvironment',"FPGA",'Simulation','on'); dlQuantObj_FPGA = dlquantizer(snet,'ExecutionEnvironment',"FPGA");
Use the calibrate
function to exercise the network with sample inputs and collect the range information. The calibrate
function collects the dynamic ranges of the weights and biases. The calibrate function returns a table. Each row of the table contains range information for a learnable parameter of the quantized network.
calibrate(dlQuantObj_simulation,calibrationData_reduced)
ans=35×5 table
Optimized Layer Name Network Layer Name Learnables / Activations MinValue MaxValue
____________________________ __________________ ________________________ ___________ __________
{'conv_1_Weights' } {'conv_1' } "Weights" -0.048978 0.039352
{'conv_1_Bias' } {'conv_1' } "Bias" 0.99996 1.0028
{'conv_2_Weights' } {'conv_2' } "Weights" -0.055518 0.061901
{'conv_2_Bias' } {'conv_2' } "Bias" -0.00061171 0.00227
{'conv_3_Weights' } {'conv_3' } "Weights" -0.045942 0.046927
{'conv_3_Bias' } {'conv_3' } "Bias" -0.0013998 0.0015218
{'conv_4_Weights' } {'conv_4' } "Weights" -0.045967 0.051
{'conv_4_Bias' } {'conv_4' } "Bias" -0.00164 0.0037892
{'fc_1_Weights' } {'fc_1' } "Weights" -0.051394 0.054344
{'fc_1_Bias' } {'fc_1' } "Bias" -0.00052319 0.00084454
{'fc_2_Weights' } {'fc_2' } "Weights" -0.05016 0.051557
{'fc_2_Bias' } {'fc_2' } "Bias" -0.0017564 0.0018502
{'fc_3_Weights' } {'fc_3' } "Weights" -0.050706 0.04678
{'fc_3_Bias' } {'fc_3' } "Bias" -0.02951 0.024855
{'imageinput' } {'imageinput'} "Activations" 0 255
{'imageinput_normalization'} {'imageinput'} "Activations" -139.34 193.72
⋮
calibrate(dlQuantObj_FPGA,calibrationData_reduced)
ans=35×5 table
Optimized Layer Name Network Layer Name Learnables / Activations MinValue MaxValue
____________________________ __________________ ________________________ ___________ __________
{'conv_1_Weights' } {'conv_1' } "Weights" -0.048978 0.039352
{'conv_1_Bias' } {'conv_1' } "Bias" 0.99996 1.0028
{'conv_2_Weights' } {'conv_2' } "Weights" -0.055518 0.061901
{'conv_2_Bias' } {'conv_2' } "Bias" -0.00061171 0.00227
{'conv_3_Weights' } {'conv_3' } "Weights" -0.045942 0.046927
{'conv_3_Bias' } {'conv_3' } "Bias" -0.0013998 0.0015218
{'conv_4_Weights' } {'conv_4' } "Weights" -0.045967 0.051
{'conv_4_Bias' } {'conv_4' } "Bias" -0.00164 0.0037892
{'fc_1_Weights' } {'fc_1' } "Weights" -0.051394 0.054344
{'fc_1_Bias' } {'fc_1' } "Bias" -0.00052319 0.00084454
{'fc_2_Weights' } {'fc_2' } "Weights" -0.05016 0.051557
{'fc_2_Bias' } {'fc_2' } "Bias" -0.0017564 0.0018502
{'fc_3_Weights' } {'fc_3' } "Weights" -0.050706 0.04678
{'fc_3_Bias' } {'fc_3' } "Bias" -0.02951 0.024855
{'imageinput' } {'imageinput'} "Activations" 0 255
{'imageinput_normalization'} {'imageinput'} "Activations" -139.34 193.72
⋮
Create Target Object
Create a target object with a custom name for your target device and an interface to connect your target device to the host computer. Interface options are JTAG and Ethernet. To use JTAG, install Xilinx™ Vivado™ Design Suite 2020.2. To set the Xilinx Vivado toolpath, enter:
% hdlsetuptoolpath('ToolName', 'Xilinx Vivado', 'ToolPath', 'C:\Xilinx\Vivado\2020.2\bin\vivado.bat');
To create the target object, enter:
hTarget = dlhdl.Target('Xilinx','Interface','Ethernet');
Alternatively, you can also use the JTAG interface.
% hTarget = dlhdl.Target('Xilinx', 'Interface', 'JTAG');
Create dlQuantizationOptions
Object
Create a dlquantizationOptions
object. Specify the target bitstream and target board interface. The default metric function is a Top-1 accuracy metric function.
options_FPGA = dlquantizationOptions('Bitstream','zcu102_int8','Target',hTarget); options_simulation = dlquantizationOptions;
To use a custom metric function, specify the metric function in the dlquantizationOptions
object.
options_FPGA = dlquantizationOptions('MetricFcn',{@(x)hComputeAccuracy(x,snet,validationData_FPGA)},'Bitstream','zcu102_int8','Target',hTarget); options_simulation = dlquantizationOptions('MetricFcn',{@(x)hComputeAccuracy(x,snet,validationData_simulation)})
Validate Quantized Network
Use the validate
function to quantize the learnable parameters in the convolution layers of the network. The validate
function simulates the quantized network in MATLAB. The validate
function uses the metric function defined in the dlquantizationOptions
object to compare the results of the single-data-type network object to the results of the quantized network object.
prediction_simulation = dlQuantObj_simulation.validate(validationData_simulation,options_simulation)
Compiling leg: conv_1>>relu_4 ... Compiling leg: conv_1>>relu_4 ... complete. Compiling leg: maxpool_4 ... Compiling leg: maxpool_4 ... complete. Compiling leg: fc_1>>fc_3 ... Compiling leg: fc_1>>fc_3 ... complete.
prediction_simulation = struct with fields:
NumSamples: 5
MetricResults: [1×1 struct]
Statistics: []
For validation on an FPGA, the validate function:
Programs the FPGA board by using the output of the
compile
method and the programming fileDownloads the network weights and biases
Compares the performance of the network before and after quantization
prediction_FPGA = dlQuantObj_FPGA.validate(validationData_FPGA,options_FPGA)
### Compiling network for Deep Learning FPGA prototyping ... ### Targeting FPGA bitstream zcu102_int8. ### The network includes the following layers: 1 'imageinput' Image Input 227×227×3 images with 'zerocenter' normalization and 'randfliplr' augmentations (SW Layer) 2 'conv_1' Convolution 96 5×5×3 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 3 'relu_1' ReLU ReLU (HW Layer) 4 'maxpool_1' Max Pooling 3×3 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer) 5 'conv_2' Convolution 128 3×3×96 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 6 'relu_2' ReLU ReLU (HW Layer) 7 'maxpool_2' Max Pooling 3×3 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer) 8 'conv_3' Convolution 384 3×3×128 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 9 'relu_3' ReLU ReLU (HW Layer) 10 'maxpool_3' Max Pooling 3×3 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer) 11 'conv_4' Convolution 128 3×3×384 convolutions with stride [2 2] and padding [0 0 0 0] (HW Layer) 12 'relu_4' ReLU ReLU (HW Layer) 13 'maxpool_4' Max Pooling 3×3 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer) 14 'fc_1' Fully Connected 2048 fully connected layer (HW Layer) 15 'relu_5' ReLU ReLU (HW Layer) 16 'dropout_1' Dropout 50% dropout (HW Layer) 17 'fc_2' Fully Connected 2048 fully connected layer (HW Layer) 18 'relu_6' ReLU ReLU (HW Layer) 19 'dropout_2' Dropout 50% dropout (HW Layer) 20 'fc_3' Fully Connected 32 fully connected layer (HW Layer) 21 'softmax' Softmax softmax (HW Layer) 22 'classoutput' Classification Output crossentropyex with 'adidas' and 31 other classes (SW Layer) ### Notice: The layer 'imageinput' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software. ### Notice: The layer 'softmax' with type 'nnet.cnn.layer.SoftmaxLayer' is implemented in software. ### Notice: The layer 'classoutput' with type 'nnet.cnn.layer.ClassificationOutputLayer' is implemented in software. ### Compiling layer group: conv_1>>relu_4 ... ### Compiling layer group: conv_1>>relu_4 ... complete. ### Compiling layer group: maxpool_4 ... ### Compiling layer group: maxpool_4 ... complete. ### Compiling layer group: fc_1>>fc_3 ... ### Compiling layer group: fc_1>>fc_3 ... complete. ### Allocating external memory buffers: offset_name offset_address allocated_space _______________________ ______________ ________________ "InputDataOffset" "0x00000000" "12.0 MB" "OutputResultOffset" "0x00c00000" "4.0 MB" "SchedulerDataOffset" "0x01000000" "4.0 MB" "SystemBufferOffset" "0x01400000" "36.0 MB" "InstructionDataOffset" "0x03800000" "8.0 MB" "ConvWeightDataOffset" "0x04000000" "12.0 MB" "FCWeightDataOffset" "0x04c00000" "12.0 MB" "EndOffset" "0x05800000" "Total: 88.0 MB" ### Network compilation complete. ### FPGA bitstream programming has been skipped as the same bitstream is already loaded on the target FPGA. ### Deep learning network programming has been skipped as the same network is already loaded on the target FPGA. ### Finished writing input activations. ### Running single input activation. Deep Learning Processor Bitstream Build Info Resource Utilized Total Percentage ------------------ ---------- ------------ ------------ LUTs (CLB/ALM)* 248358 274080 90.62 DSPs 384 2520 15.24 Block RAM 581 912 63.71 * LUT count represents Configurable Logic Block(CLB) utilization in Xilinx devices and Adaptive Logic Module (ALM) utilization in Intel devices. ### Notice: The layer 'imageinput' of type 'ImageInputLayer' is split into an image input layer 'imageinput' and an addition layer 'imageinput_norm' for normalization on hardware. ### Notice: The layer 'softmax' with type 'nnet.cnn.layer.SoftmaxLayer' is implemented in software. ### Notice: The layer 'classoutput' with type 'nnet.cnn.layer.ClassificationOutputLayer' is implemented in software. Deep Learning Processor Estimator Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 40142478 0.18247 1 40142478 5.5 ____imageinput_norm 216472 0.00098 ____conv_1 6825671 0.03103 ____maxpool_1 3755088 0.01707 ____conv_2 10440701 0.04746 ____maxpool_2 1447840 0.00658 ____conv_3 9405685 0.04275 ____maxpool_3 1765856 0.00803 ____conv_4 1819636 0.00827 ____maxpool_4 28098 0.00013 ____fc_1 2651288 0.01205 ____fc_2 1696632 0.00771 ____fc_3 89511 0.00041 * The clock frequency of the DL processor is: 220MHz Deep Learning Processor Bitstream Build Info Resource Utilized Total Percentage ------------------ ---------- ------------ ------------ LUTs (CLB/ALM)* 168645 274080 61.53 DSPs 800 2520 31.75 Block RAM 453 912 49.67 * LUT count represents Configurable Logic Block(CLB) utilization in Xilinx devices and Adaptive Logic Module (ALM) utilization in Intel devices. ### Finished writing input activations. ### Running single input activation.
prediction_FPGA = struct with fields:
NumSamples: 1
MetricResults: [1×1 struct]
Statistics: [2×7 table]
View Performance of Quantized Neural Network
Display the accuracy of the quantized network.
prediction_simulation.MetricResults.Result
ans=2×2 table
NetworkImplementation MetricOutput
_____________________ ____________
{'Floating-Point'} 1
{'Quantized' } 1
prediction_FPGA.MetricResults.Result
ans=2×2 table
NetworkImplementation MetricOutput
_____________________ ____________
{'Floating-Point'} 1
{'Quantized' } 1
Display the performance of the quantized network in frames per second.
prediction_FPGA.Statistics.FramesPerSecond(2)
ans = 19.0828
See Also
dlhdl.Workflow
| dlhdl.Target
| compile
| deploy
| predict
| dlquantizer
| dlquantizationOptions
| calibrate
| validate
| Deep Network Designer