What Machines Want to See An image compressor for more accurate computer vision

Published

May 26, 2021

Reading time

1 min read

Researchers typically downsize images for vision networks to accommodate limited memory and accelerate processing. A new method not only compresses images but yields better classification.

What’s new: Hossein Talebi and Peyman Milanfar at Google built a learned image preprocessor that improved the accuracy of image recognition models trained on its output.

Key insight: Common approaches to downsizing, such as bilinear and bicubic methods, interpolate between pixels to determine the colors of pixels in a smaller version of an image. Information is lost in the process, which may degrade the performance of models trained on them. One solution is to train separate models that perform resizing and classification together.
How it works: The network comprises a bilinear resizer layer sandwiched between convolutional layers to enable it to accept any input image size.

The authors downsized ImageNet examples to 224x224 using a garden-variety bilinear resizer and used them to train a DenseNet-121. This resizer-classifier pair served as a baseline.
They further trained the DenseNet-121 while training their resizer jointly, optimizing for both classification accuracy and input size.

Results: The authors’ approach achieved top-5 error on ImageNet of 10.8 percent. The baseline model achieved 12.8 percent.

Yes, but: The proposed method consumed 35 percent more processing power (7.65 billion FLOPS) than the baseline (5.67 billion FLOPS).

Why it matters: Machine learning engineers have adopted conventional resizing methods without considering their impact on performance. If we must discard information, we can devise an algorithm that learns to keep what’s the most important.

We’re thinking: In between training vision networks, you might use this image processor to produce mildly interesting digital art.

Subscribe to The Batch