CaiT

1 Post

Shifted Patch Tokenization (SPT) | Locality Self-Attention (LSA)
CaiT

Less Data for Vision Transformers

Vision Transformer (ViT) outperformed convolutional neural networks in image classification, but it required more training data. New work enabled ViT and its variants to outperform other architectures with less training data.
2 min read

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox