Compressing Neural Networks for Embedded AI: Pruning, Projection, and Quantization
This Tech Talk explores how to compress neural network models so they can run efficiently on embedded systems without sacrificing accuracy. Many neural networks are overparameterized, meaning they contain more weights and structure than necessary. This excess can be systematically reduced through three powerful techniques: pruning, projection, and quantization. Using a hands-on MATLAB® example, you’ll learn how to compress a trained model that classifies cracked pavement using acceleration data—achieving over 94% reduction in model size while maintaining high accuracy.
Published: 8 Aug 2025