주요 콘텐츠

Pruning, Projection, and Quantization

Compress deep neural networks, reduce network memory, and prepare network for code generation

Use Deep Learning Toolbox™ together with the Deep Learning Toolbox Model Compression Library support package to reduce the memory footprint and computational requirements of a deep neural network:

  • Prune filters from convolution layers by using first-order Taylor approximation.

  • Project layers by performing principal component analysis (PCA) on the layer activations.

  • Quantize the weights, biases, and activations of layers to reduced precision scaled integer data types.

You can then generate code from the compressed network to deploy to your desired hardware.

Simplified illustration of compression. On the left is a sketch of a large neural network with a label indicating the network is 20 MB. An arrow points to a second sketch on the right, which shows a smaller model inside a box. A label indicates the smaller network is 5 MB.

Categories

  • Get Started with Network Compression
    Learn the basics of the Deep Learning Toolbox Model Compression Library
  • Pruning
    Prune network filters using first-order Taylor approximation; reduce number of learnable parameters
  • Projection
    Project network layers using principal component analysis (PCA); reduce number of learnable parameters
  • Quantization
    Quantize network parameters to reduced-precision data types; prepare deep learning network for fixed-point code generation
  • Network Compression Applications
    Explore deep learning model compression in end-to-end workflows

Featured Examples