Machine Learning Research

426 Posts

Visual model aligning diffusion embeddings with DINOv2 encoders using REPA and DiT/SiT blocks.
Machine Learning Research

Faster Learning for Diffusion Models: Pretrained embeddings accelerate diffusion transformers’ learning

Diffusion transformers learn faster when they can look at embeddings generated by a pretrained model like DINOv2.
Diagram comparing diffusion, flow matching, and shortcut models for image generation with fewer steps.
Machine Learning Research

Better Images in Fewer Steps: Researchers introduce shortcut models to speed up diffusion

Diffusion models usually take many noise-removal steps to produce an image, which takes time at inference. There are ways to reduce the number of steps, but the resulting systems are less effective. Researchers devised a streamlined approach that doesn’t sacrifice output quality.
Comparison table of Gemini and Gemma models across benchmarks like MMLU, MATH, and GPQA with radar charts.
Machine Learning Research

Vision-Language, Compact and Open: Google releases Gemma 3 vision-language models with open weights

Google updated its open-weights family of large language models to include versions that handle image and video inputs.
Scientific diagram of a denoising model generating stable materials from random elements based on chemistry and symmetry
Machine Learning Research

Designer Materials: MatterGen, a diffusion model that designs new materials with specified properties

Materials that have specific properties are essential to progress in critical technologies like solar cells and batteries. A machine learning model designs new materials to order.
AI co-scientist workflow diagram showing a research goal assigned to specialized AI agents for hypothesis testing and ranking
Machine Learning Research

Science Research Proposals Made to Order: AI Co-Scientist, an agent that generates research hypotheses, aiding drug discovery

An AI agent synthesizes novel scientific research hypotheses. It's already making an impact in biomedicine.
AYA Vision architecture diagram showing vision encoder, multimodal merging, and LLM backbone for image processing
Machine Learning Research

Equally Fluent in Many Languages: Cohere’s Aya Vision beats multilingual rivals in text & image understanding

Multilingual AI models often suffer uneven performance across languages, especially in multimodal tasks. A pair of lean models counters this trend with consistent understanding of text and images across major languages.
AI model performance benchmark comparing R1 1776 and DeepSeek-R1 across MMLU, DROP, MATH-500, and AIME 2024 tests.
Machine Learning Research

DeepSeek-R1 Uncensored: Perplexity launches uncensored version of DeepSeek-R1

Large language models built by developers in China may, in some applications, be less useful outside that country because they avoid topics its government deems politically sensitive. A developer fine-tuned DeepSeek-R1 to widen its scope without degrading its overall performance.
Phi-4 Mini multimodal architecture integrating vision, audio, and text with token merging and LoRA-adapted weights for AI processing.
Machine Learning Research

Microsoft Tackles Voice-In, Text-Out: Microsoft’s Phi-4 Multimodal model can process text, images, and speech simultaneously

Microsoft debuted its first official large language model that responds to spoken input.
QwQ-32B vs DeepSeek-R1 AI model performance benchmark across AIME24, LiveCodeBench, LiveBench, IFEval, and BFCL datasets.
Machine Learning Research

Compact Reasoning: QwQ-32B challenges DeepSeek-R1 and other larger reasoning models

Most models that have learned to reason via reinforcement learning were huge models. A much smaller model now competes with them.
Table comparing Claude 3.7, 3.5, o1, o3-mini, DeepSeek R1, and Grok 3 Beta on reasoning, coding, tools, visuals, and math.
Machine Learning Research

Budget for Reasoning to the Token: Claude 3.7 Sonnet adds extended thinking mode

Anthropic’s Claude 3.7 Sonnet implements a hybrid reasoning approach that lets users decide how much thinking they want the model to do before it renders a response.
Table comparing GPT-4.5, GPT-4o, and o3-mini on GPQA, AIME 2024, MMLU, MMMU, and coding tests.
Machine Learning Research

OpenAI’s GPT-4.5 Goes Big: OpenAI releases GPT-4.5, its most powerful non-reasoning model and maybe its last

OpenAI launched GPT-4.5, which may be its last non-reasoning model.
Table comparing AI models on throughput, HumanEval, MBPP, EvalPlus, MultiPL-E, and code completion.
Machine Learning Research

Text Generation by Diffusion: Mercury Coder uses diffusion to generate text

Typical large language models are autoregressive, predicting the next token, one at a time, from left to right. A new model hones all text tokens at once.
Diagram of Coconut, a method training LLMs to process thought chains as vectors, comparing it to Chain-of-Thought (CoT).
Machine Learning Research

Reasoning in Vectors, Not Text: Meta introduces Chain of Continuous Thought (Coconut) to improve next-token prediction

Although large language models can improve their performance by generating a chain of thought (CoT) — intermediate text tokens that break down the process of responding to a prompt into a series of steps.
A person typing a prompt in an AI-powered mobile app with a button to improve the input.
Machine Learning Research

Mobile Apps to Order: Replit’s agent-powered mobile app expands to full app development

Replit, an AI-driven integrated development environment, updated its mobile app to generate further mobile apps to order.
AI model comparison on reasoning and test-time compute across math, science, and coding benchmarks.
Machine Learning Research

Grok 3 Scales Up: Grok 3, xAI’s new model family, improves on its predecessors, adds reasoning

xAI’s new model family suggests that devoting more computation to training remains a viable path to building more capable AI.
Load More

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox