Main Content

Quantize Network for FPGA Deployment

Reduce the memory footprint of a deep neural network by quantizing the weights, biases, and activations of convolution layers to 8-bit scaled integer data types. This example shows how to use Deep Learning Toolbox Model Quantization Library and Deep Learning HDL Toolbox to deploy the int8 network to a target FPGA board.

For this example, you need:

  • Deep Learning Toolbox ™

  • Deep Learning HDL Toolbox ™

  • Deep Learning Toolbox Model Quantization Library

  • Deep Learning HDL Toolbox Support Package for Xilinx FPGA and SoC Devices

  • MATLAB Coder Interface for Deep Learning.

Load Pretrained Network

Load the pretrained LogoNet network and analyze the network architecture.

snet = getLogoNetwork;
deepNetworkDesigner(snet);

Load Data

This example uses the logos_dataset data set. The data set consists of 320 images. Each image is 227-by-227 in size and has three color channels (RGB). Create an augmentedImageDatastore object for calibration and validation. Expedite calibration and validation by reducing the calibration data set to 20 images. The MATLAB simulation workflow has a maximum limit of five images when validating the quantized network. Reduce the validation data set sizes to five images. The FPGA validation workflow has a maximum limit of one image when validating the quantized network. Reduce the FPGA validation data set to a single image.

curDir = pwd;
unzip("logos_dataset.zip");
imageData = imageDatastore(fullfile(curDir,'logos_dataset'),...
'IncludeSubfolders',true,'FileExtensions','.JPG','LabelSource','foldernames');
[calibrationData, validationData] = splitEachLabel(imageData, 0.5,'randomized');
calibrationData_reduced = calibrationData.subset(1:20);
validationData_simulation = validationData.subset(1:5);
validationData_FPGA = validationData.subset(1:1);

Generate Calibration Result File for the Network

Create a dlquantizer object and specify the network to quantize. Specify the execution environment as FPGA.

dlQuantObj_simulation = dlquantizer(snet,'ExecutionEnvironment',"FPGA",'Simulation','on');
dlQuantObj_FPGA = dlquantizer(snet,'ExecutionEnvironment',"FPGA");

Use the calibrate function to exercise the network with sample inputs and collect the range information. The calibrate function collects the dynamic ranges of the weights and biases. The calibrate function returns a table. Each row of the table contains range information for a learnable parameter of the quantized network.

calibrate(dlQuantObj_simulation,calibrationData_reduced)
ans=35×5 table
        Optimized Layer Name        Network Layer Name    Learnables / Activations     MinValue       MaxValue 
    ____________________________    __________________    ________________________    ___________    __________

    {'conv_1_Weights'          }      {'conv_1'    }           "Weights"                -0.048978      0.039352
    {'conv_1_Bias'             }      {'conv_1'    }           "Bias"                     0.99996        1.0028
    {'conv_2_Weights'          }      {'conv_2'    }           "Weights"                -0.055518      0.061901
    {'conv_2_Bias'             }      {'conv_2'    }           "Bias"                 -0.00061171       0.00227
    {'conv_3_Weights'          }      {'conv_3'    }           "Weights"                -0.045942      0.046927
    {'conv_3_Bias'             }      {'conv_3'    }           "Bias"                  -0.0013998     0.0015218
    {'conv_4_Weights'          }      {'conv_4'    }           "Weights"                -0.045967         0.051
    {'conv_4_Bias'             }      {'conv_4'    }           "Bias"                    -0.00164     0.0037892
    {'fc_1_Weights'            }      {'fc_1'      }           "Weights"                -0.051394      0.054344
    {'fc_1_Bias'               }      {'fc_1'      }           "Bias"                 -0.00052319    0.00084454
    {'fc_2_Weights'            }      {'fc_2'      }           "Weights"                 -0.05016      0.051557
    {'fc_2_Bias'               }      {'fc_2'      }           "Bias"                  -0.0017564     0.0018502
    {'fc_3_Weights'            }      {'fc_3'      }           "Weights"                -0.050706       0.04678
    {'fc_3_Bias'               }      {'fc_3'      }           "Bias"                    -0.02951      0.024855
    {'imageinput'              }      {'imageinput'}           "Activations"                    0           255
    {'imageinput_normalization'}      {'imageinput'}           "Activations"              -139.34        193.72
      ⋮

calibrate(dlQuantObj_FPGA,calibrationData_reduced)
ans=35×5 table
        Optimized Layer Name        Network Layer Name    Learnables / Activations     MinValue       MaxValue 
    ____________________________    __________________    ________________________    ___________    __________

    {'conv_1_Weights'          }      {'conv_1'    }           "Weights"                -0.048978      0.039352
    {'conv_1_Bias'             }      {'conv_1'    }           "Bias"                     0.99996        1.0028
    {'conv_2_Weights'          }      {'conv_2'    }           "Weights"                -0.055518      0.061901
    {'conv_2_Bias'             }      {'conv_2'    }           "Bias"                 -0.00061171       0.00227
    {'conv_3_Weights'          }      {'conv_3'    }           "Weights"                -0.045942      0.046927
    {'conv_3_Bias'             }      {'conv_3'    }           "Bias"                  -0.0013998     0.0015218
    {'conv_4_Weights'          }      {'conv_4'    }           "Weights"                -0.045967         0.051
    {'conv_4_Bias'             }      {'conv_4'    }           "Bias"                    -0.00164     0.0037892
    {'fc_1_Weights'            }      {'fc_1'      }           "Weights"                -0.051394      0.054344
    {'fc_1_Bias'               }      {'fc_1'      }           "Bias"                 -0.00052319    0.00084454
    {'fc_2_Weights'            }      {'fc_2'      }           "Weights"                 -0.05016      0.051557
    {'fc_2_Bias'               }      {'fc_2'      }           "Bias"                  -0.0017564     0.0018502
    {'fc_3_Weights'            }      {'fc_3'      }           "Weights"                -0.050706       0.04678
    {'fc_3_Bias'               }      {'fc_3'      }           "Bias"                    -0.02951      0.024855
    {'imageinput'              }      {'imageinput'}           "Activations"                    0           255
    {'imageinput_normalization'}      {'imageinput'}           "Activations"              -139.34        193.72
      ⋮

Create Target Object

Create a target object with a custom name for your target device and an interface to connect your target device to the host computer. Interface options are JTAG and Ethernet. To use JTAG, install Xilinx™ Vivado™ Design Suite 2020.2. To set the Xilinx Vivado toolpath, enter:

% hdlsetuptoolpath('ToolName', 'Xilinx Vivado', 'ToolPath', 'C:\Xilinx\Vivado\2020.2\bin\vivado.bat');

To create the target object, enter:

hTarget = dlhdl.Target('Xilinx','Interface','Ethernet');

Alternatively, you can also use the JTAG interface.

% hTarget = dlhdl.Target('Xilinx', 'Interface', 'JTAG');

Create dlQuantizationOptions Object

Create a dlquantizationOptions object. Specify the target bitstream and target board interface. The default metric function is a Top-1 accuracy metric function.

options_FPGA = dlquantizationOptions('Bitstream','zcu102_int8','Target',hTarget);
options_simulation = dlquantizationOptions;

To use a custom metric function, specify the metric function in the dlquantizationOptions object.

options_FPGA = dlquantizationOptions('MetricFcn',{@(x)hComputeAccuracy(x,snet,validationData_FPGA)},'Bitstream','zcu102_int8','Target',hTarget);
options_simulation = dlquantizationOptions('MetricFcn',{@(x)hComputeAccuracy(x,snet,validationData_simulation)})

Validate Quantized Network

Use the validate function to quantize the learnable parameters in the convolution layers of the network. The validate function simulates the quantized network in MATLAB. The validate function uses the metric function defined in the dlquantizationOptions object to compare the results of the single-data-type network object to the results of the quantized network object.

prediction_simulation = dlQuantObj_simulation.validate(validationData_simulation,options_simulation)
Compiling leg: conv_1>>relu_4 ...
Compiling leg: conv_1>>relu_4 ... complete.
Compiling leg: maxpool_4 ...
Compiling leg: maxpool_4 ... complete.
Compiling leg: fc_1>>fc_3 ...
Compiling leg: fc_1>>fc_3 ... complete.
prediction_simulation = struct with fields:
       NumSamples: 5
    MetricResults: [1×1 struct]
       Statistics: []

For validation on an FPGA, the validate function:

  • Programs the FPGA board by using the output of the compile method and the programming file

  • Downloads the network weights and biases

  • Compares the performance of the network before and after quantization

prediction_FPGA = dlQuantObj_FPGA.validate(validationData_FPGA,options_FPGA)
### Compiling network for Deep Learning FPGA prototyping ...
### Targeting FPGA bitstream zcu102_int8.
### The network includes the following layers:
     1   'imageinput'    Image Input             227×227×3 images with 'zerocenter' normalization and 'randfliplr' augmentations  (SW Layer)
     2   'conv_1'        Convolution             96 5×5×3 convolutions with stride [1  1] and padding [0  0  0  0]                (HW Layer)
     3   'relu_1'        ReLU                    ReLU                                                                             (HW Layer)
     4   'maxpool_1'     Max Pooling             3×3 max pooling with stride [2  2] and padding [0  0  0  0]                      (HW Layer)
     5   'conv_2'        Convolution             128 3×3×96 convolutions with stride [1  1] and padding [0  0  0  0]              (HW Layer)
     6   'relu_2'        ReLU                    ReLU                                                                             (HW Layer)
     7   'maxpool_2'     Max Pooling             3×3 max pooling with stride [2  2] and padding [0  0  0  0]                      (HW Layer)
     8   'conv_3'        Convolution             384 3×3×128 convolutions with stride [1  1] and padding [0  0  0  0]             (HW Layer)
     9   'relu_3'        ReLU                    ReLU                                                                             (HW Layer)
    10   'maxpool_3'     Max Pooling             3×3 max pooling with stride [2  2] and padding [0  0  0  0]                      (HW Layer)
    11   'conv_4'        Convolution             128 3×3×384 convolutions with stride [2  2] and padding [0  0  0  0]             (HW Layer)
    12   'relu_4'        ReLU                    ReLU                                                                             (HW Layer)
    13   'maxpool_4'     Max Pooling             3×3 max pooling with stride [2  2] and padding [0  0  0  0]                      (HW Layer)
    14   'fc_1'          Fully Connected         2048 fully connected layer                                                       (HW Layer)
    15   'relu_5'        ReLU                    ReLU                                                                             (HW Layer)
    16   'dropout_1'     Dropout                 50% dropout                                                                      (HW Layer)
    17   'fc_2'          Fully Connected         2048 fully connected layer                                                       (HW Layer)
    18   'relu_6'        ReLU                    ReLU                                                                             (HW Layer)
    19   'dropout_2'     Dropout                 50% dropout                                                                      (HW Layer)
    20   'fc_3'          Fully Connected         32 fully connected layer                                                         (HW Layer)
    21   'softmax'       Softmax                 softmax                                                                          (HW Layer)
    22   'classoutput'   Classification Output   crossentropyex with 'adidas' and 31 other classes                                (SW Layer)
                                                                                                                                
### Notice: The layer 'imageinput' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software.
### Notice: The layer 'softmax' with type 'nnet.cnn.layer.SoftmaxLayer' is implemented in software.
### Notice: The layer 'classoutput' with type 'nnet.cnn.layer.ClassificationOutputLayer' is implemented in software.
### Compiling layer group: conv_1>>relu_4 ...
### Compiling layer group: conv_1>>relu_4 ... complete.
### Compiling layer group: maxpool_4 ...
### Compiling layer group: maxpool_4 ... complete.
### Compiling layer group: fc_1>>fc_3 ...
### Compiling layer group: fc_1>>fc_3 ... complete.

### Allocating external memory buffers:

          offset_name          offset_address    allocated_space 
    _______________________    ______________    ________________

    "InputDataOffset"           "0x00000000"     "12.0 MB"       
    "OutputResultOffset"        "0x00c00000"     "4.0 MB"        
    "SchedulerDataOffset"       "0x01000000"     "4.0 MB"        
    "SystemBufferOffset"        "0x01400000"     "36.0 MB"       
    "InstructionDataOffset"     "0x03800000"     "8.0 MB"        
    "ConvWeightDataOffset"      "0x04000000"     "12.0 MB"       
    "FCWeightDataOffset"        "0x04c00000"     "12.0 MB"       
    "EndOffset"                 "0x05800000"     "Total: 88.0 MB"

### Network compilation complete.

### FPGA bitstream programming has been skipped as the same bitstream is already loaded on the target FPGA.
### Deep learning network programming has been skipped as the same network is already loaded on the target FPGA.
### Finished writing input activations.
### Running single input activation.


              Deep Learning Processor Bitstream Build Info

Resource                   Utilized           Total        Percentage
------------------        ----------      ------------    ------------
LUTs (CLB/ALM)*              248358            274080           90.62
DSPs                            384              2520           15.24
Block RAM                       581               912           63.71
* LUT count represents Configurable Logic Block(CLB) utilization in Xilinx devices and Adaptive Logic Module (ALM) utilization in Intel devices.

### Notice: The layer 'imageinput' of type 'ImageInputLayer' is split into an image input layer 'imageinput' and an addition layer 'imageinput_norm' for normalization on hardware.
### Notice: The layer 'softmax' with type 'nnet.cnn.layer.SoftmaxLayer' is implemented in software.
### Notice: The layer 'classoutput' with type 'nnet.cnn.layer.ClassificationOutputLayer' is implemented in software.


              Deep Learning Processor Estimator Performance Results

                   LastFrameLatency(cycles)   LastFrameLatency(seconds)       FramesNum      Total Latency     Frames/s
                         -------------             -------------              ---------        ---------       ---------
Network                   40142478                  0.18247                       1           40142478              5.5
    ____imageinput_norm     216472                  0.00098 
    ____conv_1             6825671                  0.03103 
    ____maxpool_1          3755088                  0.01707 
    ____conv_2            10440701                  0.04746 
    ____maxpool_2          1447840                  0.00658 
    ____conv_3             9405685                  0.04275 
    ____maxpool_3          1765856                  0.00803 
    ____conv_4             1819636                  0.00827 
    ____maxpool_4            28098                  0.00013 
    ____fc_1               2651288                  0.01205 
    ____fc_2               1696632                  0.00771 
    ____fc_3                 89511                  0.00041 
 * The clock frequency of the DL processor is: 220MHz




              Deep Learning Processor Bitstream Build Info

Resource                   Utilized           Total        Percentage
------------------        ----------      ------------    ------------
LUTs (CLB/ALM)*              168645            274080           61.53
DSPs                            800              2520           31.75
Block RAM                       453               912           49.67
* LUT count represents Configurable Logic Block(CLB) utilization in Xilinx devices and Adaptive Logic Module (ALM) utilization in Intel devices.

### Finished writing input activations.
### Running single input activation.
prediction_FPGA = struct with fields:
       NumSamples: 1
    MetricResults: [1×1 struct]
       Statistics: [2×7 table]

View Performance of Quantized Neural Network

Display the accuracy of the quantized network.

prediction_simulation.MetricResults.Result
ans=2×2 table
    NetworkImplementation    MetricOutput
    _____________________    ____________

     {'Floating-Point'}           1      
     {'Quantized'     }           1      

prediction_FPGA.MetricResults.Result
ans=2×2 table
    NetworkImplementation    MetricOutput
    _____________________    ____________

     {'Floating-Point'}           1      
     {'Quantized'     }           1      

Display the performance of the quantized network in frames per second.

prediction_FPGA.Statistics.FramesPerSecond(2)
ans = 19.0828

See Also

| | | | | | | | |

Related Topics