Transformers

13 Posts

STORM pipeline overview: Mamba layers link the image encoder and LLM, adding temporal info to tokens and reducing image tokens without losing key details.

Better Video, Fewer Tokens: STORM Processes Fewer Tokens And Still Beats GPT-4o On Video Understanding Benchmarks

Researchers reduced the number of tokens needed to represent video frames to be fed to a transformer.

Dual line graphs showing factual QA accuracy and NLL against memory size for NQ and TQA datasets in AI models.

Memory Layers for More-Factual Output: Meta researchers build Llama-style models that recall details without needing more computing resources

Improving a large language model’s factual accuracy typically requires making it bigger, which in turn, involves more computation. Researchers devised an architecture that enables models to recall relevant details without significantly increasing the amount of computation required.

Diagram of LLM-based preference approximation and multimodal sequential recommendation for personalized product suggestions.

Transformers

Inferring Customer Preferences: LLMs boost shopping recommendations by decoding what users want

Large language models can improve systems that recommend items to purchase by inferring customer preferences.

Diagram of latent transformer model using byte-level encoding, patching, and cross-attention for next-byte prediction.

Transformers

Toward LLMs That Understand Misspellings: New byte-based model beats Llama 3 on spelling, noise, and translation

Researchers built a model that’s more robust to noisy inputs like misspellings, smarter about character-level information like the number of R's in strawberry, and potentially better able to understand unfamiliar languages that might share groups of letters with familiar languages.

TabPFN neural network diagram showing synthetic training, prediction on real-world tabular data, and attention layers.

Transformers

Better Than Trees for Tabular Data: Transformers can outperform decision trees at predicting unlabeled spreadsheet cells

If you have a collection of variables that represent, say, a cancer patient and you want to classify the patient’s illness as likely cancer or not, algorithms based on decision trees, such as gradient-boosted trees, typically perform better than neural networks.

Diagram of Coconut, a method training LLMs to process thought chains as vectors, comparing it to Chain-of-Thought (CoT).

Transformers

Reasoning in Vectors, Not Text: Meta introduces Chain of Continuous Thought (Coconut) to improve next-token prediction

Although large language models can improve their performance by generating a chain of thought (CoT) — intermediate text tokens that break down the process of responding to a prompt into a series of steps.

X-CLR loss: training models to link text captions and image similarity.

Transformers

Calibrating Contrast: X-CLR, an approach to contrastive learning for better vision models

Contrastive loss functions make it possible to produce good embeddings without labeled data. A twist on this idea makes even more useful embeddings.

Performance comparison for Gemini models across benchmarks.

Transformers

Multimodal Modeling on the Double: Google introduces Gemini 2.0 Flash, a faster, more capable AI model

Google’s Gemini 2.0 Flash, the first member of its updated Gemini family of large multimodal models, combines speed with performance that exceeds that of its earlier flagship model, Gemini 1.5 Pro, on several measures.

Grounding DINO animation depicting object detection with bounding boxes on images.

Transformers

Object Detection for Small Devices: Grounding DINO 1.5, an edge device model built for faster, smarter object detection

An open source model is designed to perform sophisticated object detection on edge devices like phones, cars, medical equipment, and smart doorbells.

Efficient Foundations animation showing layered AI model components.

Transformers

More-Efficient Training for Transformers: Researchers reduce transformer training costs by 20% with minimal performance loss

Researchers cut the processing required to train transformers by around 20 percent with only a slight degradation in performance.

Transformers

Making LLMs Explainable: Google’s Gemma Scope probes how large language models think

Researchers have probed the inner workings of individual layers of large language models. A new tool applies this approach to all layers.

Throughput and latency at different context lengths

Transformers

Long Context Gets Up to Speed: AI21 Labs’ Jamba 1.5 outpaces transformers in long-text processing

A new model generates tokens faster than current transformers, especially when processing long inputs.

A man with electrodes connected through his skull is connected to a machine.

Transformers

A Lost Voice Regained: Brain implants paired with neural network reconstruct speech for ALS patient

A man who lost the ability to speak four years ago is sounding like his earlier self, thanks to a collection of brain implants and machine learning models.

Transformers

Better Video, Fewer Tokens: STORM Processes Fewer Tokens And Still Beats GPT-4o On Video Understanding Benchmarks

Memory Layers for More-Factual Output: Meta researchers build Llama-style models that recall details without needing more computing resources

Inferring Customer Preferences: LLMs boost shopping recommendations by decoding what users want

Toward LLMs That Understand Misspellings: New byte-based model beats Llama 3 on spelling, noise, and translation

Better Than Trees for Tabular Data: Transformers can outperform decision trees at predicting unlabeled spreadsheet cells

Reasoning in Vectors, Not Text: Meta introduces Chain of Continuous Thought (Coconut) to improve next-token prediction

Calibrating Contrast: X-CLR, an approach to contrastive learning for better vision models

Multimodal Modeling on the Double: Google introduces Gemini 2.0 Flash, a faster, more capable AI model

Object Detection for Small Devices: Grounding DINO 1.5, an edge device model built for faster, smarter object detection

More-Efficient Training for Transformers: Researchers reduce transformer training costs by 20% with minimal performance loss

Making LLMs Explainable: Google’s Gemma Scope probes how large language models think

Long Context Gets Up to Speed: AI21 Labs’ Jamba 1.5 outpaces transformers in long-text processing

A Lost Voice Regained: Brain implants paired with neural network reconstruct speech for ALS patient

Subscribe to The Batch