Weight Loss for AI

Published

May 15, 2019

Reading time

1 min read

Larger neural networks can deliver better results, yet researchers found a way to make deep learning models perform just as well at one-tenth the size. Their work was awarded best paper at this year's International Conference on Learning Representations.

What’s new: Researchers at MIT developed a procedure to identify, within a trained network, a much smaller subnetwork that performs the desired task as fast and accurately as its bulky counterpart.

How it works: The researchers started with a fully-connected convolutional feed-forward architecture. They initialized the network randomly and trained it over a number of iterations. Then they trimmed off the connections with the lowest weights in each layer. Finally, they reset the remaining connections to their initialization values and retrained. They repeated this process several times to achieve high performance in compact size.

Why it matters: Researchers Jonathan Frankle and Michael Carbin built on earlier work on pruning neural networks, but they achieved much more dramatic results. Apparently such high-performance subnetworks exist in any neural network, depending on weights assigned during initialization. The researchers call these high-performance subnetworks “winning tickets,” and they propose an algorithm to identify them. Winning tickets require a fraction of the usual memory resources, computational power, and energy. Spotting them early in the model-building process might yield immense savings.

To be sure: The researchers pruned only networks devoted to computer vision and trained on small data sets. It’s not clear whether the results would be equally impressive otherwise. Moreover, their method currently requires training a network more than a dozen times, so it requires a lot of computation.

What’s next: The researchers aim to identify winning tickets early, making it possible to build compact networks from the start. They’ll be studying winning tickets in hope of discovering more powerful architectures and initialization methods.

Subscribe to The Batch