GPU Coder

Version 1.3, part of Release 2019a, includes the following enhancements:

  • YOLO V2 object detector: Generate code from YOLO V2 object detector for cuDNN and TensorRT targets
  • TensorRT support: Generate code that takes advantage of FP16 optimization in deep learning inference applications
  • CUDA optimized transpose function: Apply transposes using shared memory for improved performance
  • Unbounded variable support: Generate code for variables whose maximum size is not known and unbounded at compile time
  • Code generation for pdist, pdist2 and cwt functions: Generate code for additional statistics, machine learning and wavelet functions

See the Release Notes for details.

Version 1.2, part of Release 2018b, includes the following enhancements:

  • Deep Learning Retargetability: Deploy applications that use deep learning networks onto Intel MKL-DNN, and NVIDIA TensorRT by using the codegen function 
  • Thrust Library Support: Generate GPU-accelerated code for sort and reduction operations by using the Thrust library
  • Deep Learning Optimization: Improve performance and memory utilization through auto-tuning, layer fusion, and buffer minimization
  • gpuArray Support: Use gpuArray arguments at the I/O of MEX targets
  • Support Package for NVIDIA GPUs: Target NVIDIA Jetson and DRIVE platforms​​
  • Calling External CUDA Functions: Use GPU arguments that pass by reference when using coder.ceval
  • Deep Learning Layers: Generate code for new network layers
  • Ease-of-use and traceability improvements
  • Code generation for more Image Processing Toolbox functions
  • Deep learning examples

See the Release Notes for details.

Version 1.1, part of Release 2018a, includes the following enhancements:

  • Directed Acyclic Graph (DAG) Networks: Generate CUDA code for deep learning networks with DAG topology
  • Deep Learning Layers: Generate CUDA code for popular networks such as GoogLeNet, ResNet, and SegNet
  • TensorRT Support: Generate code that takes advantage of NVIDIA deep learning inference optimizer and run time
  • Multi-Platform Deep Learning Targeting: Deploy deep learning networks to Intel and ARM processors

See the Release Notes for details.