What Is Half Precision?
This video introduces the concept of half precision or float16, a relatively new floating-point data. It can be used to reduce memory usage by half and has become very popular for accelerating deep learning training and inference. We also look at the benefits as well as the tradeoffs over traditional 32-bit single precision or 64-bit double-precision data types for traditional control applications.
Published: 2 Jan 2020
Half precision or float16 is a relatively new floating-point data type that uses 16 bits, unlike traditional 32-bit single precision or 64-bit double-precision data types.
So, when you declare a variable as half in MATLAB, say the number pi, you may notice some loss of precision when compared to single or double representation as we see here.
The difference comes from the limited numbers of bits used by half precision. We only have 10 bits of precision and 5 bits for the exponent as opposed to 23 bits of precision and 8 bits for exponent in single. Hence the eps is much larger and also the dynamic range is limited.
So why is it important? Half’s recent popularity is because of its usefulness in accelerating deep learning training and inference mainly on NVIDIA GPUs as highlighted in the articles here. In addition, both Intel and ARM platforms also support half to accelerate computations.
The obvious benefit of using half precision is in reducing the memory and reducing the data bandwidth by 50% as we see here for Resnet50. In addition, the hardware vendors also provide hardware acceleration for computations in half such as the CUDA intrinsics in the case of NVIDIA GPUs.
We are seeing traditional applications such as powertrain control systems do the same where you may have data in the form of lookup tables as shown in a simple illustration here. By using half as the storage type, you are able to reduce the memory footprint of this 2D lookup table by 4x.
However, it is important to understand the tradeoff of the limited precision and range of half precision. For instance, in case of the deep learning network, the quantization error was of the order of 10^-4 and one has to analyze how this impacts the overall accuracy of the network.
This was a short introduction to half precision. Please refer to links below to learn more on how to simulate and generate C/C++ or CUDA code from half in MATLAB and Simulink.
웹사이트 선택
번역된 콘텐츠를 보고 지역별 이벤트와 혜택을 살펴보려면 웹사이트를 선택하십시오. 현재 계신 지역에 따라 다음 웹사이트를 권장합니다:
또한 다음 목록에서 웹사이트를 선택하실 수도 있습니다.
사이트 성능 최적화 방법
최고의 사이트 성능을 위해 중국 사이트(중국어 또는 영어)를 선택하십시오. 현재 계신 지역에서는 다른 국가의 MathWorks 사이트 방문이 최적화되지 않았습니다.
미주
- América Latina (Español)
- Canada (English)
- United States (English)
유럽
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
아시아 태평양
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)