Diffusion Models

20 Posts

Bar chart shows HunyuanImage 3.0's performance against Nano Banana and Seedream 4.0, highlighting differences.
Diffusion Models

Better Images Through Reasoning: HunyuanImage-3.0 uses reinforcement learning and thinking tokens to better understand prompts

A new image generator reasons over prompts to produce outstanding pictures.
Three AI-generated video clips: a man vaulting over a moving car, a gymnast flipping on a plane wing, and a rabbit ice skating in pink boots.
Diffusion Models

Mixture of Video Experts: Alibaba’s Wan 2.2 video models adopt a new architecture to sort noisy from less-noisy inputs

The mixture-of-experts approach that has boosted the performance of large language models may do the same for video generation.
Visual model aligning diffusion embeddings with DINOv2 encoders using REPA and DiT/SiT blocks.
Diffusion Models

Faster Learning for Diffusion Models: Pretrained embeddings accelerate diffusion transformers’ learning

Diffusion transformers learn faster when they can look at embeddings generated by a pretrained model like DINOv2.
Diagram comparing diffusion, flow matching, and shortcut models for image generation with fewer steps.
Diffusion Models

Better Images in Fewer Steps: Researchers introduce shortcut models to speed up diffusion

Diffusion models usually take many noise-removal steps to produce an image, which takes time at inference. There are ways to reduce the number of steps, but the resulting systems are less effective. Researchers devised a streamlined approach that doesn’t sacrifice output quality.
Scientific diagram of a denoising model generating stable materials from random elements based on chemistry and symmetry
Diffusion Models

Designer Materials: MatterGen, a diffusion model that designs new materials with specified properties

Materials that have specific properties are essential to progress in critical technologies like solar cells and batteries. A machine learning model designs new materials to order.
Table comparing AI models on throughput, HumanEval, MBPP, EvalPlus, MultiPL-E, and code completion.
Diffusion Models

Text Generation by Diffusion: Mercury Coder uses diffusion to generate text

Typical large language models are autoregressive, predicting the next token, one at a time, from left to right. A new model hones all text tokens at once.
DAVID DING
Diffusion Models

David Ding: Generated video with music, sound effects, and dialogue

Last year, we saw an explosion of models that generate either video or audio outputs in high quality. In the coming year, I look forward to models that produce video clips complete with audio soundtracks including speech, music, and sound effects.
A GIF with scenes of a man at a café, a working robot, a ghost in a mirror, and a speeding truck.
Diffusion Models

Open Video Gen Closes the Gap: Tencent releases HunyuanVideo, an open source model rivaling commercial video generators

The gap is narrowing between closed and open models for video generation.
Game character climbing a ladder with visible controls (QWASD) and health bars.
Diffusion Models

Game Worlds on Tap: Genie 2 brings interactive 3D worlds to life

A new model improves on recent progress in generating interactive virtual worlds from still images.
Berkeley Function Calling Leaderboard with metrics like accuracy, latency, and relevance.
Diffusion Models

Competitive Performance, Competitive Prices: Amazon introduces Nova models for text, image, and video

Amazon introduced a range of models that confront competitors head-on.
Temporal pyramids in rows (left) and position encoding in space-time pyramid shown in the pyramidal flow matching process.
Diffusion Models

Faster, Cheaper Video Generation: Pyramidal Flow Matching, a cost-cutting method for training video generators

Researchers devised a way to cut the cost of training video generators. They used it to build a competitive open source text-to-video model and promised to release the training code.
For Faster Diffusion, Think a GAN: Adversarial Diffusion Distillation, a method to accelerate diffusion models
Diffusion Models

For Faster Diffusion, Think a GAN: Adversarial Diffusion Distillation, a method to accelerate diffusion models

Generative adversarial networks (GANs) produce images quickly, but they’re of relatively low quality. Diffusion image generators typically take more time, but they produce higher-quality output. Researchers aimed to achieve the best of both worlds.
Excerpt from Google Pixel 8 promotional video
Diffusion Models

Generative AI Calling: Google brings advanced computer vision and audio tech to Pixel 8 and 8 Pro phones.

Google’s new mobile phones put advanced computer vision and audio research into consumers’ hands. The Alphabet division introduced its flagship Pixel 8 and Pixel 8 Pro smartphones at its annual hardware-launch event. Both units feature AI-powered tools for editing photos and videos.
Diffusion Transformed: A new class of diffusion models based on the transformer architecture
Diffusion Models

Diffusion Transformed: A new class of diffusion models based on the transformer architecture

A tweak to diffusion models, which are responsible for most of the recent excitement about AI-generated images, enables them to produce more realistic output.
Stable Biases: Stable Diffusion may amplify biases in its training data.
Diffusion Models

Stable Biases: Stable Diffusion may amplify biases in its training data.

Stable Diffusion may amplify biases in its training data in ways that promote deeply ingrained social stereotypes.
Load More

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox