Using Lookup Tables to Accelerate Deep Learning Inference
This video highlights the lookup table optimization capability to generate an efficient lookup table for a sigmoid function, which is a key activation function used in deep learning networks. We then compare the relative speedup on an Arduino Due® and STMicroelectronics® discovery board using the generated code for hardware in the loop simulation.
Published: 19 Nov 2019
A lookup table is a key construct for embedded designs, and is often used to speed up the run-time execution of certain functions of your algorithm. For instance, complex trig functions are often replaced with a more efficient LUT implementation.
Let’s try a simple experiment – applying the same principle to the sigmoid function to investigate how we can accelerate the deep learning inference performance particularly on the edge.
The sigmoid function is a key building block for neural networks and is one of the commonly used nonlinear activation functions used in deep learning networks.
Here we have a simple Simulink subsystem that models the sigmoid function. I am going to use the Lookup Table Optimizer app to generate an optimal LUT, specifying the input and output data types. Since this is a bounded function, I can specify the bounds on the output and finally the tolerance on the output of 1%.
Once the optimization problem is solved, we can look at the comparison plot to verify that the error of the LUT approximation is within our specified tolerance.
Now as a next step, lets generate C code from the sigmoid function and the generated LUT and deploy it to a cortex M platform like the Arduino board.
We use hardware-in-the-loop simulation to run the generated code with inputs from Simulink. There is some overhead of running the code in this mode but this still gives us a good comparison of the relative execution speed.
As you can see from the execution profile, the LUT is 2.5 x faster on the Arduino. I repeated the same test on a Cortex M7 based STMicro discovery board. Here is a plot showing the relative speedup the lookup table with different data types.
In fact, this can scale up if you can share the lookup table approximation between all neurons, further decreasing the execution speed by orders of magnitude. You can do the same experiment with other activation functions like hyperbolic tangent.
To learn more about optimizing LUTs in your design, please refer to additional links below the video.
Featured Product
Fixed-Point Designer
Up Next:
Related Videos:
웹사이트 선택
번역된 콘텐츠를 보고 지역별 이벤트와 혜택을 살펴보려면 웹사이트를 선택하십시오. 현재 계신 지역에 따라 다음 웹사이트를 권장합니다:
또한 다음 목록에서 웹사이트를 선택하실 수도 있습니다.
사이트 성능 최적화 방법
최고의 사이트 성능을 위해 중국 사이트(중국어 또는 영어)를 선택하십시오. 현재 계신 지역에서는 다른 국가의 MathWorks 사이트 방문이 최적화되지 않았습니다.
미주
- América Latina (Español)
- Canada (English)
- United States (English)
유럽
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
아시아 태평양
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)