Name	Name	Last commit message	Last commit date
Latest commit History 3 Commits
notes	notes
.gitignore	.gitignore
README.md	README.md

This is a collection of papers aiming at reducing model sizes or the ASIC/FPGA accelerator for Machine Learning, especially deep neural network related applications. (Inspired by Embedded-Neural-Network.)

You can use the following materials as your entrypoint:

Efficient Processing of Deep Neural Networks: A Tutorial and Survey
the related work of Quantized Neural Networks

Terminologies

Structural pruning (compression): compress CNNs based on removing "less important" filter.

Network Compression

Reduce Precision

Deep neural networks are robust to weight binarization and other non-linear distortions showed that DNN can be robust to more than just weight binarization.

Linear Quantization

Fixed point
- [1502]. Deep Learning with Limited Numerical Precision
- [1610]. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks
Dynamic fixed point
Binary Quantization
- Theory proof (EBP)
- More practice with 1 bit
- XNOR-Net with slightly large bits (1~2 bit)
  - [1606]. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients
  - [1608]. Recurrent Neural Networks With Limited Numerical Precision
  - [1609]. Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations. (Text overlap with Binarized Neural Network.)
  - [1702]. Deep Learning with Low Precision by Half-wave Gaussian Quantization
Ternary Quantization
- [1410]. Fixed-point feedforward deep neural network design using weights +1, 0, and -1
- [1605]. Ternary Weight Networks
- [1612]. Trained Ternary Quantization
Other Quantization or others
- [1412]. Compressing Deep Convolutional Networks using Vector Quantization
1-Bit Stochastic Gradient Descent and its Application to Data-Parallel Distributed Training of Speech DNNs
Towards the Limit of Network Quantization.
Loss-aware Binarization of Deep Networks.

Non-linear Quantization

Log Domain Quantization
Parameter Sharing
- Structured Matrices
  - Structured Convolution Matrices for Energy-efficient Deep learning.
  - Structured Transforms for Small-Footprint Deep Learning.
  - An Exploration of Parameter Redundancy in Deep Networks with Circulant Projections.
  - Theoretical Properties for Neural Networks with Weight Matrices of Low Displacement Rank.
- Hashing
  - [1504]. Compressing neural networks with the hashing trick
  - Functional Hashing for Compressing Neural Networks
- [1510]. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding
- Learning compact recurrent neural networks.

Reduce Number of Operations and Model Size

Exploiting Activation Statistics

To be updated.

Network Pruning

Network Prune: a large amount of the weights in a network are redundant and can be removed (i.e., set to zero).

Remove low saliency
- [9006]. Optimal Brain Damage
- [1506]. Learning both weights and connections for efficient neural network
Energy-based prune
- [1611]. Designing energy-efficient convolutional neural networks using energy-aware pruning
Process sparse weights
Structured pruning

Bayesian network pruning

[1711]. Interpreting Convolutional Neural Networks Through Compression - [notes][arXiv]
[1705]. Structural compression of convolutional neural networks based on greedy filter pruning - [notes][arXiv]

Compact Network Architectures

Before Training
- use 1*1 convolutional layer to reduce the number of channels
- Bottleneck:
  - [1312]. Network in network
  - [1409]. Going deeper with convolutions
  - [1512]. Deep residual learning for image recognition
  - [1602]. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size
After Training
- Canonical Polyadic (CP) decomposition
  - [1404]. Exploiting linear structure within convolutional networks for efficient evaluation
  - [1412]. Speeding-up convolutional neural networks using fine-tuned cp-decomposition
- Tucker decomposition
  - [1511]. Compression of deep convolutional neural networks for fast and low power mobile applications

Knowledge Distillation

[0600]. Model compression
[1312]. Do deep nets really need to be deep?
[1412]. Fitnets: Hints for thin deep nets
[1503]. Distilling the knowledge in a neural network
Sequence-Level Knowledge Distillation.
Like What You Like: Knowledge Distill via Neuron Selectivity Transfer.

A Bit Hardware

[1402]. Computing's Energy Porblem (and what we can do about it)

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

statsml/compress-net-notes

Folders and files

Latest commit

History

Repository files navigation

Terminologies

Network Compression

Reduce Precision

Linear Quantization

Non-linear Quantization

Reduce Number of Operations and Model Size

Exploiting Activation Statistics

Network Pruning

Bayesian network pruning

Compact Network Architectures

Knowledge Distillation

A Bit Hardware

Contributors

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Contributors 2

Uh oh!

statsml/compress-net-notes

Folders and files

Latest commit

History

Repository files navigation

Terminologies

Network Compression

Reduce Precision

Linear Quantization

Non-linear Quantization

Reduce Number of Operations and Model Size

Exploiting Activation Statistics

Network Pruning

Bayesian network pruning

Compact Network Architectures

Knowledge Distillation

A Bit Hardware

Contributors

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Packages