Quantizing a Deep Learning Network in MATLAB
In this video, we demonstrate the deep learning quantization workflow in MATLAB. Using the Model Quantization Library Support Package, we illustrate how you can calibrate, quantize, and validate a deep learning network such as Resnet50. We also highlight the impact of quantization on reducing the memory of some standard networks such as Resnet101 and InceptionV3.
Published: 23 Apr 2020
Deep Learning quantization is a key optimization strategy for efficient deployment of deep learning networks, particularly on embedded platforms.
I am Ram Cherukuri, senior product manager at MathWorks and in this video I will give you an overview of the deep learning quantization workflow in MATLAB.
Quantizing the weights, biases, and activations to lower precision data types like INT8 or FP16 significantly reduces the memory footprint of the AI algorithm and can result in improved inference performance on the embedded hardware.
You can use the Model Quantization Library Support Package for quantizing your deep learning network in MATLAB. You can download it from the Add-On Explorer as shown here.
The quantization workflow leverages instrumentation, based on a calibration datastore to compute the instrumentation statistics that are used to quantize the weights, biases, and activations of the layers of the network.
Finally, the validation step computes accuracy metrics to analyze and understand the impact of quantization on the accuracy of the network. Let’s take Resnet50 as an example network to go through this workflow.
Here is the Deep Learning Quantizer app, where you first import the network from the MATLAB workspace and you will see the network structure displayed on the left side pane.
Next, you select the data store that you would like to use for calibration and the app displays the computed statistics such as the min and max values of weights, biases, and activations of each layer. You can also choose the layers that you can quantize and then validate the impact of quantization using a validation datastore.
In this example, we have used the default top 1 accuracy metric and you can see that there is a 67% reduction in memory with no drop in accuracy. You can then proceed to generate code from the quantized network for deployment.
We repeated this workflow with a few networks, only quantizing the compute-intensive conv layers to INT8.
You can see the impact of quantization in the chart here. For instance, the largest network here with 180 MB in memory, Resnet101, sees 72% compression with 2% drop in accuracy. InceptionV3, on the other hand, has the largest drop in accuracy of 4%, with 67% compression, going from 100 MB to 33 MB in memory.
This highlights the significant impact of quantization for efficient deployment of deep learning networks.
Please refer to the resources below the video to learn how to get started and explore these new capabilities in MATLAB.
Featured Product
Deep Learning Toolbox
Up Next:
Related Videos:
웹사이트 선택
번역된 콘텐츠를 보고 지역별 이벤트와 혜택을 살펴보려면 웹사이트를 선택하십시오. 현재 계신 지역에 따라 다음 웹사이트를 권장합니다:
또한 다음 목록에서 웹사이트를 선택하실 수도 있습니다.
사이트 성능 최적화 방법
최고의 사이트 성능을 위해 중국 사이트(중국어 또는 영어)를 선택하십시오. 현재 계신 지역에서는 다른 국가의 MathWorks 사이트 방문이 최적화되지 않았습니다.
미주
- América Latina (Español)
- Canada (English)
- United States (English)
유럽
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
아시아 태평양
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)