Machine Learning Research

518 Posts

Flowchart showing Tiny Recursive Model process with stages: input, prediction, and latent refinement.
Machine Learning Research

Small Models Solve Hard Puzzles: Tiny Recursive Model beats larger competitors at games like Sudoku and Maze

Large language models often fail at puzzles like Sudoku, for which a solution includes multiple elements and a single mistake invalidates all of them. Researchers showed that a tiny network, by repeatedly refining its solution, can solve this sort of puzzle well.
Table comparing Nova 2 Pro to other models in reasoning, coding, perception, and workflows.
Machine Learning Research

Amazon Steps Forward: Nova 2 family boosts cost-effective performance, adds new agentic features

Amazon raised the competitive profile of its foundation models and added services for custom model training and an agent platform for browser automation.
Table highlights Opus 4.5’s superior scores in coding and reasoning compared to other AI models.
Machine Learning Research

Claude Does More With Fewer Tokens: Claude Opus 4.5 retakes the coding crown at one-third the price of its predecessor

Claude Opus 4.5, the latest version of Anthropic’s flagship model, extends the earlier version’s strengths in coding, computer use, and agentic workflows while generating fewer tokens.
In a lab, four robots move a metal frame using graph neural network coordination on a platform.
Machine Learning Research

Coordinating Robot Teams: Google DeepMind’s RoboBallet project blends GNNs with RL to drive 8-armed robots

In factories, where teams of robotic arms work in tight spaces, their motions are programmed by hand to keep them from interfering with one another. Researchers automated this programming using graph neural networks trained via reinforcement learning.
Graph shows Ernie-4.5 outperforming competitors in document understanding and visual reasoning tasks.
Machine Learning Research

Baidu’s Multimodal Bids: Giant Ernie 5 natively generates multiple media; Ernie-4.5-VL-28B-A3B-Thinking tops Vision-Language metrics

Baidu debuted two models: a lightweight, open-weights, vision-language model and a giant, proprietary, multimodal model built to take on U.S. competitors.
GIF showing a 360° walkthrough of a conference room with a wooden table, high-back chairs, wall screens, and ceiling lights.
Machine Learning Research

Generated, Editable Virtual Spaces: World Labs makes Marble world model public, adds Chisel editing tool

Models that generate 3D spaces typically generate them as users move through them without generating a persistent world to be explored later. A new model produces 3D worlds that can be exported and modified.
GIF showing AI object detection tagging penguins on a beach, cars in traffic, and dancing people.
Machine Learning Research

Open 3D Generation Pipeline: Meta’s SAM 3 image segmentation models can analyze and create bodies and other objects

Meta’s Segment Anything Model (SAM) image-segmentation model has evolved into an open-weights suite for generating 3D objects. SAM 3 segments images, SAM 3D turns the segments into 3D objects, and SAM 3D Body produces 3D objects of any people among the segments. You can experiment with all three.
Diagram shows AI traits with pipelines for "evil" vs. "helpful" responses to user queries on animal treatment.
Machine Learning Research

Toward Steering LLM Personality: Persona Vectors allow model builders to identify and edit out sycophancy, hallucinations, and more

Large language models can develop character traits like cheerfulness or sycophancy during fine-tuning. Researchers developed a method to identify, monitor, and control such traits.
Table shows Gemini 3 Pro leading in benchmarks, outperforming Gemini 2.5, Claude Sonnet 4.5, and GPT-5.1.
Machine Learning Research

Google Dominates Arena Leaderboards (For the Moment): Gemini 3 Pro and Nano Banana Pro boast best-in-class multimodal reasoning and image generation

Google introduced Gemini 3 Pro and Nano Banana Pro, its flagship vision-language and image-generation models, and deployed them to billions of users worldwide.
Image illustrates the Self-Search method, simulating web searches to improve model accuracy in tests.
Machine Learning Research

More-Efficient Agentic Search: Researchers fine-tune models to search their own parameters to boost recall

Large language models may have learned knowledge that’s relevant to a given prompt, but they don’t always recall it consistently. Fine-tuning a model to search its parameters as though it were searching the web can help it find knowledge in its own weights.
Visual map outlines cybercrime operation phases, highlighting AI-driven processes and human validation steps.
Machine Learning Research

Anthropic Cyberattack Report Sparks Controversy: Security researchers question whether coding agents allow unprecedented automated attacks

Independent cybersecurity researchers pushed back on a report by Anthropic that claimed hackers had used its Claude Code agentic coding system to perpetrate an unprecedented automated cyberattack.
Chart highlights Kimi K2’s top performance in agentic tasks, outperforming rivals in reasoning and coding.
Machine Learning Research

Top Agentic Results, Open Weights: Kimi K2 Thinking outperforms proprietary models with new techniques for agentic tool use

The latest open-weights large language model from Moonshot AI challenges top proprietary LLMs at agentic tasks by executing hundreds of tool calls sequentially and pausing to think between each.
White Waymo vehicle near water, city skyline visible; displays autonomous service for urban freeways.
Machine Learning Research

Self-Driving Cars on U.S. Freeways: Waymo deploys autonomous cars on California and Arizona expressways

Waymo became the first company to offer fully autonomous, driverless taxi service on freeways in the United States.
Series of graphs transformed via tokenization and transformer layers, resulting in predicted outputs.
Machine Learning Research

Forecasting Multiple Time Series: Amazon’s Chronos-2 sorts out tangled variables to make better predictions

Transformers are well suited to predicting future values of time series like energy prices, wages, or weather, but often — as in those examples — multiple time series often influence one another. Researchers built a model that can forecast multiple time series simultaneously.
AI models are compared on a graph showing benchmark accuracy from 20% to 100%, highlighting GPT-5's rise.
Machine Learning Research

The Year AI Went Industrial: The State of AI Report 2025 says AI’s barriers aren’t technological but social and material

A year-in-review report heralds the dawn of AI’s industrial era.
Load More

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox