Convolution Plus

Reading time
1 min read
Attention-augmented convolution

The technique known as attention has yielded spectacular results in speech and natural language tasks. Now researchers have shown that it can improve image recognizers as well.

What’s new: A residual neural network incorporating self-attention layers beat the state of the art accuracy on ImageNet classification by 1.3% over a ResNet50 baseline, according to a new paper. It also beat the best Common Objects in Context object detection score over a RetinaNet baseline. This is not the first use of attention in vision tasks, but it outperforms earlier efforts.

How it works: Quoc Le and his colleagues augmented convolution layers with multi-headed self-attention layers. The resulting network uses attention in parallel with convolution and concatenates the outputs produced by each. Attention extends a network’s awareness beyond the data it’s processing at a given moment. Adding it to a convolutional neural network enables the model to consider relations between different areas of an image.

Why it matters: The attention networks generally require fewer parameters than convolution layers to achieve an accuracy target. That makes it possible to improve performance in a same-sized model.

Bottom line: Image recognizers based on CNNs already achieve high accuracy and sensitivity, and adding attention makes them even sharper.


Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox