Swin

3 Posts

Shifted Patch Tokenization (SPT) | Locality Self-Attention (LSA)
Swin

Less Data for Vision Transformers

Vision Transformer (ViT) outperformed convolutional neural networks in image classification, but it required more training data. New work enabled ViT and its variants to outperform other architectures with less training data.
2 min read
Overview of Mobile-Former | Cross attention over the entire featuremap for the first token in Mobile→Former
Swin

High Accuracy at Low Power

Equipment that relies on computer vision while unplugged — mobile phones, drones, satellites, autonomous cars — need power-efficient models. A new architecture set a record for accuracy per computation.
2 min read
Animated image showing the transformer architecture of processing an image
Swin

Transformer Speed-Up Sped Up: How to Speed Up Image Transformers

The transformer architecture is notoriously inefficient when processing long sequences — a problem in processing images, which are essentially long sequences of pixels. One way around this is to break up input images and process the pieces
1 min read

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox