As neural networks have become more accurate, they’ve also ballooned in size and computational cost. That makes many state-of-the-art models impractical to run on phones and potentially smaller, less powerful devices. A new technique makes convolutional neural networks much less computationally intensive without significantly degrading their performance.
What’s new: Zhonghui You and colleagues at Peking University and Momenta, an autonomous-vehicle startup, propose a way to remove parameters that aren’t critical to a model’s performance: Gate Decorator.
Key insight: The new technique removes functional groups of parameters (specifically convolutional filters), rather than individual parameters.
How it works: Gate Decorator adds to the model a scaling factor that represents each filter’s importance to the model’s output. It ranks filters by their impact on the model’s loss function. Then it removes the least effective ones.
- The model processes a subset of training data to learn the scaling factor’s value for each filter. The original model’s parameters retain their existing values.
- The scaling factors are randomly initialized. For each filter, the model is encouraged to learn the smallest scaling factor that, multiplied by the filter’s output, takes the least toll on performance.
- A user-specified fraction of filters with the smallest scaling factor are deleted. The pruned network is fine-tuned on the entire training set.
- The process is repeated for a user-defined number of iterations.
Results: The researchers compared the accuracy and computational cost of original and pruned networks. Gate Decorator cut the computational cost of an ImageNet-trained ResNet by 55 percent and a CIFAR-trained ResNet by 70 percent. Accuracy for these models decreased by 0.67 percent and increased by 0.03 percent, respectively. That’s state-of-the-art accuracy for such a reduction in computational cost.
Why it matters: Unlike most weight-pruning techniques, Gate Decorator’s efficiency gains are straightforward to achieve in practice, not just in theory. A model shorn of filters can still run existing algorithms, while removing weights from a densely connected neural network ultimately requires specialized algorithms that we don’t yet have.
We’re thinking: A pruning method like this might work with other parameter groupings to cut the computational demand of architectures beyond CNNs. The resulting models could be further compressed using other methods such as quantization.