Transformers

30 Posts

GIF showing fluid simulation with patch jittering, followed by predicted vs. actual flows across time steps 0–38.
Transformers

How Liquids and Gases Behave: A dynamic fluids model appears to solve transformers’ pixellation problem

Simulating complex physical systems through traditional numerical methods is slow and expensive, and simulations based on machine learning are usually specialized for a specific type of system, such as water in a pipe or atmosphere surrounding a planet.
Diagram shows DNA analysis with interconnected devices, output types, and species-specific data.
Transformers

Dark DNA Unveiled: Google’s AlphaGenome interprets DNA that regulates genetic expression

An open-weights model could help scientists compare the impact of genetic variations, identify mutations that cause diseases, and develop treatments.
Two graphs show TTT-E2E maintains stable loss and latency across increasing context lengths up to 128k.
Transformers

Learning Long Context at Inference: Test-Time Training End-to-End (TTT-E2E) retrains model weights to handle long inputs

Large language models typically become less accurate and slower when they process longer contexts, but researchers enabled an LLM to keep accuracy stable and inference time constant as its context grew.
Apple's AToken, a multimodal model with a single encoder and tokenizer for images, videos, and 3D objects
Transformers

A Single Tokenizer for Visual Media: Apple’s AToken, a multimodal model with a single encoder and tokenizer for images, videos, and 3D objects

Multimodal models typically use different tokenizers to embed different media types, and different encoders when training to generate media rather than classify it.
Diagram shows SleepFM's data processing flow from sleep signals to disease prediction using neural networks.
Transformers

Sleep Signals Predict Illness: SleepFM detects signs of neurological disorders years before symptoms manifest

Difficulty sleeping often precedes heart disease, psychiatric disorders, and many other illnesses. Researchers used data gathered during sleep studies to detect such conditions.
Two comparison tables show AI model performance across varied benchmarks, highlighting LFM2.5-1.2B.
Transformers

Faster Reasoning at the Edge: Liquid AI’s small reasoning model mixes attention with convolutional layers for efficiency

Reasoning models in the 1 to 2 billion-parameter range typically require more than 1 gigabyte of RAM to run. Liquid AI released one that runs in less than 900 megabytes, and does it with exceptional speed and efficiency.
Flowchart showing Mistral Small 3.1 model distillation into smaller Ministral 3 models with post-training steps.
Transformers

Recipe for Smaller, Capable Models: Mistral uses cascade distillation on Mistral 3 to build Ministral family

Mistral compressed Mistral Small 3.1 into much smaller versions, yielding a family of relatively small, open-weights, vision-language models that perform better by some measures than competing models of similar size. The method combines pruning and distillation.
Collage with comic strip, concert poster, diagrams on water cycle and trash sorting, and movie poster.
Transformers

Refining Words in Pictures: Z.ai’s GLM-Image blends transformer and diffusion architectures for better text in images

Image generators often mangle text. An open-weights model outperforms open and proprietary competitors in text rendering.
View from a car on a tree-lined street, with an overlay instructing to decelerate if hazards are detected.
Transformers

Training Cars to Reason: Nvidia’s Alpamayo-R1 is a robotics-style reasoning model for autonomous vehicles

Chain-of-thought reasoning can help autonomous vehicles decide what to do next.
Graph shows Ernie-4.5 outperforming competitors in document understanding and visual reasoning tasks.
Transformers

Baidu’s Multimodal Bids: Giant Ernie 5 natively generates multiple media; Ernie-4.5-VL-28B-A3B-Thinking tops Vision-Language metrics

Baidu debuted two models: a lightweight, open-weights, vision-language model and a giant, proprietary, multimodal model built to take on U.S. competitors.
Chart highlights Kimi K2’s top performance in agentic tasks, outperforming rivals in reasoning and coding.
Transformers

Top Agentic Results, Open Weights: Kimi K2 Thinking outperforms proprietary models with new techniques for agentic tool use

The latest open-weights large language model from Moonshot AI challenges top proprietary LLMs at agentic tasks by executing hundreds of tool calls sequentially and pausing to think between each.
Series of graphs transformed via tokenization and transformer layers, resulting in predicted outputs.
Transformers

Forecasting Multiple Time Series: Amazon’s Chronos-2 sorts out tangled variables to make better predictions

Transformers are well suited to predicting future values of time series like energy prices, wages, or weather, but often — as in those examples — multiple time series often influence one another. Researchers built a model that can forecast multiple time series simultaneously.
A performance comparison table highlights Ling-1T's success in reasoning and coding tasks against rivals.
Transformers

Reasoning Without “Thinking”: All about Ant Group’s Ling-1T, an open, non-reasoning model that outperforms closed competitors

Reasoning models typically learn to undertake a separate process of “thinking” through their output of before they produce final response. Ant Group built a top non-reasoning model that can take similar steps as part of its immediate response.
Graphs compare DeepSeek models showing reduced cost per million tokens with V3.2-Exp over V3.1-Terminus.
Transformers

DeepSeek Cuts Inference Costs: DeepSeek-V3.2-Exp streamlines processing using a "lightning indexer," boosting efficiency

DeepSeek’s latest large language model can cut inference costs by more than half and processes long contexts dramatically faster relative to its predecessor.
Energy-Based Transformer refines predictions step by step, lowering energy for higher context compatibility.
Transformers

Transformers Energized: Energy-Based Transformers (EBTs) use gradient descent to gradually predict the next token

A new type of transformer can check its work. Instead of guessing the next output token in one shot like a typical transformer, it starts with a rough version of the token and improves it step by step.
Load More

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox