List some papers related to Neural Networks Pruning that I have read recently (still ongoing). A more complete list on this topic may be found on GitHub.

The format is {title}({year}; {authors}; {organization}): {abstract like description}. {links}. The ones mark with a leading ★ is inspiring to me.

Fine-grained (Weight) Pruning

Fine-grained pruning prunes individual unimportant elements in weight tensors, which is able to achieve very high compression rate with no loss of accuracy. But require specific libraries which supports sparse to run pruned models.


Learning both weights and connections for ecient neural network (NIPS 2015, Song Han, Jeff Pool, John Tran, William J. Dally; Standford, Nvidia): Three-step method, train to learn important connections, then prune unimportant connections, finally retrain remaining connections. Reduce AlexNet parameter 9x, VGG-16 13x, no incurring accuracy loss. L2 regularization is better than L1 with retraining. Tetraining dropout ratio should be smaller to account for the change in model capacity. Do not re-initializing parameters when retraining. Iterative pruning is the most important trick. Conv is more sensitive than FC. Seems pruned to be sparse network. Code.

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding (ICLR 2015; Song Han, Huizi Mao, William J. Dally; Standford, Tsinghua) assembles pruning, quantization and encoding to reduce the storage requirement by 35x (AlexNet) to 49x (VGG-19) without affecting their accuracy. Models.


Dynamic Network Surgery for Efficient DNNs (NIPS 2016; Yiwen Guo, Anbang Yao, Yurong Chen; Intel) Re-establish mistakenly pruned connections if they once appear to be important (splicing). L1 used to set prune threshold. 17.7X for AlexNet. Code.


CLIP-Q: Deep Network Compression Learning by In-Parallel Pruning-Quantization (CVPR 2018; Frederick Tung, Greg Mori; Simon Fraser University) Using a joint pruning-quantization approach, of which the pruning is supress very small weight to zero such that the quantization can have better original value range. (Not sure whether the pruned tensor is in sparse representation.) Conv layers of AlexNet on ImageNet: 30-40% weight are zeroed, and use 4-8 bit quantization (compress rate 4-8X) with no accuracy drop. Partily similar to Deep Compression somehow.


The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks (ICLR 2019 Best; Jonathan Frankle, Michael Carbin; MIT): The hypothesis: A randomly-initialized, dense neural network contains a subnetwork that is initialized such that—when trained in isolation—it can match the test accuracy of the original network after training for at most the same number of iterations. Pruning finds winning tickets that learn faster than the original network while reaching higher test accuracy and generalizing better. There is much help for improving training performance, network designing, and theoretical understanding of neural netoworks, but seem not much help to accelerate existing model running. Code.

Coarse-grained (Filter/Channel) Pruning

Coarse-grained / structured pruning prunes entire regular regions of weight tensors. The pruned models needs no extra support.


Structured Pruning of Deep Convolutional Neural Networks (2015; Sajid Anwar, Kyuyeon Hwang and Wonyong Sung; Seoul National University): Early try of inference time pruning. Computing convolution in Im2col based approach, and prune it as matrix product. The importance weight of each particle is assigned by assessing the misclassification rate with a corresponding connectivity pattern. The pruned network is retrained to compensate for the losses due to pruning. Prune the CIFAR-10 network by more than 70% with less than a 1% loss in accuracy. Not even benchmark models.

Channel-level acceleration of deep face representations (2015; Adam Polyak and Lior Wolf; Tel Aviv University): The amount of information each channel contributes is measured by the variance of the specific channel activation output (not accuracy). Seems skip the pruned channel when computing - the data flow shape needs no change. Fine-tune re-train after pruned each layer, layer by layer, from lowest to topmost.


Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures (2016; Hengyuan Hu, Rui Peng, Yu-Wing Tai, Chi-Keung Tang; HKUST, SenseTime): Measure neuron importance via Average Percentage of Zeros, prune-re-train iteratively. 2~3X pruning in some layers result in better accuracy even for VGG-16.


Pruning Convolutional Neural Networks for Resource Efficient Inference (ICLR 2017; Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, Jan Kautz; Nvidia) Using Taylor expansion as pruning criteria. 2X FLOPs with 2.3% accuracy drop for VGG-16. Code.

Pruning Filters for Efficient ConvNets (ICLR 2017; Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet and Hans Peter Graf; University of Maryland, NEC Labs America): Pruning filters rather than sparse computing. Measure the relative importance of a filter in each layer by calculating the sum of its absolute weights L1-norm, and prune smallest filters (better than measuring activation output). Layers with relatively flat slopes (usually first several lay ers) are more sensitive to pruning. Two pruning strategy Independent (don’t touch other layers) and Greedy (account filter removed in previous layers). Also handle multi-branch networks. Pruned FLOPs ~ 30% to maitain ~ 0.1% accuracy drop.

Channel Pruning for Accelerating Very Deep Neural Networks (ICCV 2017; Yihui He, Xiangyu Zhang and Jian Sun; Xi’an Jiaotong University, Megvii): An iterative two-step algorithm to effectively prune each layer, by a LASSO regression based channel selection and least square reconstruction. Further generalize this algorithm to multi-layer and multi-branch cases. VGG-16 5X speed-up along with only 0.3% increase of error. Project.

Compact Deep Convolutional Neural Networks With Coarse Pruning (ICLR 2017; Sajid Anwar and Wonyong Sung; Seoul National University): Evaluate N random combinations and compute the MCR for each one, then choose the best pruning mask which causes the least degradation to the network performance on the validation set.

Runtime Neural Pruning (NIPS 2017; Ji Lin, Yongming Rao, Jiwen Lu, and Jie Zhou; Tsinghua University): Preserves the original network and conducts pruning according to the input image and current feature maps adaptively. The pruning is performed in a bottom-up, layer-by-layer manner, which modeled as a Markov decision process and use reinforcement learning for training. I really doubt about this method - pattern seeing in some images can hardly imply the coming ones.


“Learning-Compression” Algorithms for Neural Net Pruning (CVPR 2018; Miguel A. Carreira-Perpinan, Yerlan Idelbayev; University of California): generic algorithm to optimize a regularized, data-dependent loss and mark weights for pruning in a data-independent way. Explores subsets of weights rather than committing irrevocably to a specific subset from the beginning. Learn the best number of weights to prune in each layer. Model result on CIFAR10.

Structured Probabilistic Pruning for Convolutional Neural Network Acceleration (BMVC 2018; Huan Wang, Qiming Zhang, Yuehai Wang, and Haoji Hu; Zhejiang University): A pruning probability for each weight, which guides pruning. Increase and decrease pruning probabilities based on importance criteria in the training process. 4X for AlexNet, 2X ResNet-50, with < 1% Top-5 accuracy drop. Related work part is good.

AMC: AutoML for Model Compression and Acceleration on Mobile Devices (ECCV 2018; Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li, Song Han; MIT, CMU, Google) Learning-based compression policy outperforms conventional rule-based compression policy by having higher compression ratio, better preserving the accuracy and freeing human labor. MobileNet achieved 1.81x speedup on an Android phone with only 0.1% loss of Top-1 accuracy. The “Comparison with Heuristic Channel Reduction” part is good. Code and models.

PocketFlow: An Automated Framework for Compressing and Accelerating Deep Neural Networks (CVPR 2018 CDNNRIA workshop; Jiaxiang Wu, Yao Zhang, Haoli Bai, Huasong Zhong, Jinlong Hou, Wei Liu, Wenbing Huang, Junzhou Huang; Tecent): Automated framework for model compression and acceleration, which integrates a series of model compression algorithms and embeds a hyper-parameter optimization module to automatically search for the optimal combination of hyper-parameters. Project, and doc.

Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers (ICLR 2018; Jianbo Ye, Xin Lu, Zhe Lin, James Z. Wang; The Pennsylvania State University, Adobe): TODO Code.


Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration (CVPR 2019; Yang He, Ping Liu, Ziwei Wang, Zhilan Hu, Yi Yang; University of Technology Sydney): Filter Pruning via [Geometric Median prunes filters with redundancy, rather than “relatively less” importance. Reduce 42% FLOPs ResNet-101 with no Top-5 accuracy drop. Code.