Diffusion Models

23 Posts

Collage with comic strip, concert poster, diagrams on water cycle and trash sorting, and movie poster.

Refining Words in Pictures: Z.ai’s GLM-Image blends transformer and diffusion architectures for better text in images

Image generators often mangle text. An open-weights model outperforms open and proprietary competitors in text rendering.

A warm-toned room features a sofa, a decorated shelf, and sunlight filtering through patterned curtains.

Diffusion Models

Detailed Text- or Image-to-3D, Pronto: FlashWorld generates 3D objects, scenes, and surfaces with photorealistic fidelity

Current methods that produce 3D scenes from text or images are slow and produce inconsistent results. Researchers introduced a technique that generates detailed, coherent 3D scenes seconds.

View from a car on a tree-lined street, with an overlay instructing to decelerate if hazards are detected.

Diffusion Models

Training Cars to Reason: Nvidia’s Alpamayo-R1 is a robotics-style reasoning model for autonomous vehicles

Chain-of-thought reasoning can help autonomous vehicles decide what to do next.

Bar chart shows HunyuanImage 3.0's performance against Nano Banana and Seedream 4.0, highlighting differences.

Diffusion Models

Better Images Through Reasoning: HunyuanImage-3.0 uses reinforcement learning and thinking tokens to better understand prompts

A new image generator reasons over prompts to produce outstanding pictures.

Three AI-generated video clips: a man vaulting over a moving car, a gymnast flipping on a plane wing, and a rabbit ice skating in pink boots.

Diffusion Models

Mixture of Video Experts: Alibaba’s Wan 2.2 video models adopt a new architecture to sort noisy from less-noisy inputs

The mixture-of-experts approach that has boosted the performance of large language models may do the same for video generation.

Visual model aligning diffusion embeddings with DINOv2 encoders using REPA and DiT/SiT blocks.

Diffusion Models

Faster Learning for Diffusion Models: Pretrained embeddings accelerate diffusion transformers’ learning

Diffusion transformers learn faster when they can look at embeddings generated by a pretrained model like DINOv2.

Diagram comparing diffusion, flow matching, and shortcut models for image generation with fewer steps.

Diffusion Models

Better Images in Fewer Steps: Researchers introduce shortcut models to speed up diffusion

Diffusion models usually take many noise-removal steps to produce an image, which takes time at inference. There are ways to reduce the number of steps, but the resulting systems are less effective. Researchers devised a streamlined approach that doesn’t sacrifice output quality.

Scientific diagram of a denoising model generating stable materials from random elements based on chemistry and symmetry

Diffusion Models

Designer Materials: MatterGen, a diffusion model that designs new materials with specified properties

Materials that have specific properties are essential to progress in critical technologies like solar cells and batteries. A machine learning model designs new materials to order.

Table comparing AI models on throughput, HumanEval, MBPP, EvalPlus, MultiPL-E, and code completion.

Diffusion Models

Text Generation by Diffusion: Mercury Coder uses diffusion to generate text

Typical large language models are autoregressive, predicting the next token, one at a time, from left to right. A new model hones all text tokens at once.

Diffusion Models

David Ding: Generated video with music, sound effects, and dialogue

Last year, we saw an explosion of models that generate either video or audio outputs in high quality. In the coming year, I look forward to models that produce video clips complete with audio soundtracks including speech, music, and sound effects.

A GIF with scenes of a man at a café, a working robot, a ghost in a mirror, and a speeding truck.

Diffusion Models

Open Video Gen Closes the Gap: Tencent releases HunyuanVideo, an open source model rivaling commercial video generators

The gap is narrowing between closed and open models for video generation.

Game character climbing a ladder with visible controls (QWASD) and health bars.

Diffusion Models

Game Worlds on Tap: Genie 2 brings interactive 3D worlds to life

A new model improves on recent progress in generating interactive virtual worlds from still images.

Berkeley Function Calling Leaderboard with metrics like accuracy, latency, and relevance.

Diffusion Models

Competitive Performance, Competitive Prices: Amazon introduces Nova models for text, image, and video

Amazon introduced a range of models that confront competitors head-on.

Temporal pyramids in rows (left) and position encoding in space-time pyramid shown in the pyramidal flow matching process.

Diffusion Models

Faster, Cheaper Video Generation: Pyramidal Flow Matching, a cost-cutting method for training video generators

Researchers devised a way to cut the cost of training video generators. They used it to build a competitive open source text-to-video model and promised to release the training code.

Diffusion Models

For Faster Diffusion, Think a GAN: Adversarial Diffusion Distillation, a method to accelerate diffusion models

Generative adversarial networks (GANs) produce images quickly, but they’re of relatively low quality. Diffusion image generators typically take more time, but they produce higher-quality output. Researchers aimed to achieve the best of both worlds.

Diffusion Models

Refining Words in Pictures: Z.ai’s GLM-Image blends transformer and diffusion architectures for better text in images

Detailed Text- or Image-to-3D, Pronto: FlashWorld generates 3D objects, scenes, and surfaces with photorealistic fidelity

Training Cars to Reason: Nvidia’s Alpamayo-R1 is a robotics-style reasoning model for autonomous vehicles

Better Images Through Reasoning: HunyuanImage-3.0 uses reinforcement learning and thinking tokens to better understand prompts

Mixture of Video Experts: Alibaba’s Wan 2.2 video models adopt a new architecture to sort noisy from less-noisy inputs

Faster Learning for Diffusion Models: Pretrained embeddings accelerate diffusion transformers’ learning

Better Images in Fewer Steps: Researchers introduce shortcut models to speed up diffusion

Designer Materials: MatterGen, a diffusion model that designs new materials with specified properties

Text Generation by Diffusion: Mercury Coder uses diffusion to generate text

David Ding: Generated video with music, sound effects, and dialogue

Open Video Gen Closes the Gap: Tencent releases HunyuanVideo, an open source model rivaling commercial video generators

Game Worlds on Tap: Genie 2 brings interactive 3D worlds to life

Competitive Performance, Competitive Prices: Amazon introduces Nova models for text, image, and video

Faster, Cheaper Video Generation: Pyramidal Flow Matching, a cost-cutting method for training video generators

For Faster Diffusion, Think a GAN: Adversarial Diffusion Distillation, a method to accelerate diffusion models

Subscribe to The Batch