Jul 03, 2026

6 Posts

Robot arm successfully places pot on cloth, demonstrating reward verification and scoring effectiveness.

Jul 03, 2026

Better Reward Models for Robots: Inside RoboReward, a family of vision-language reward models that train robots to take action

When you’re training a robot via reinforcement learning, a handcrafted reward function is labor-intensive to build but often dispenses rewards more effectively than a general-purpose reward model based on a vision-language model. Researchers built reward models that narrowed the gap.

The table shows MAI-Thinking-1 leading in several benchmarks, compared to other AI models.

Jul 03, 2026

Microsoft Strikes Out on Its Own: Microsoft revealed MAI-Thinking-1, a Claude Sonnet 4.6-sized reasoning model developed without distillation

Microsoft, once OpenAI’s exclusive partner and still a major reseller of other companies’ AI models, built its own reasoning model from scratch.

Six charts show Fugu and Fugu Ultra scoring highest, marked by red bars, on various tasks and benchmarks.

Jul 03, 2026

Fugu Blends Models Task by Task: Sakana debuted dedicated orchestrator models, Fugu and Fugu-Ultra, that spawn Claude, Gemini, and GPT agents

Models that orchestrate the activities of other models and agents achieved state-of-the-art performance on a variety of benchmarks, outperforming the best individual models working alone.

Jul 03, 2026

GPT-5.6 Lands in Limbo: OpenAI previewed three GPT-5.6 Models (Sol, Terra, and Luna), wider release coming soon

OpenAI announced a preview of its GPT-5.6 family, including a top-tier model comparable to Claude 5 Mythos — but so far it’s available only to users that are selected by the U.S. government.

Cartoon features a man and woman contemplating three luxury sports cars: blue, red, and orange on a blue background.

Jul 03, 2026

How We Decide What Courses to Teach: The AI world is full of hype and sales pitches. DeepLearning.AI focuses on most important tools and techniques in ways you can apply to any AI vendor’s ecosystem.

The AI world has become incredibly noisy. Social media, traditional media, and an army of marketers produce a cacophony of hype and content that are often secretly sales pitches for their products.

Jul 03, 2026

OpenAI's GPT-5.6 Family, New Ways to Train Robots, Models Invoking Models

The Batch News & Insights: The AI world has become incredibly noisy. Social media, traditional media, and an army of marketers produce a cacophony of hype and content that are often secretly sales pitches for their products.

Jul 03, 2026

Better Reward Models for Robots: Inside RoboReward, a family of vision-language reward models that train robots to take action

Microsoft Strikes Out on Its Own: Microsoft revealed MAI-Thinking-1, a Claude Sonnet 4.6-sized reasoning model developed without distillation

Fugu Blends Models Task by Task: Sakana debuted dedicated orchestrator models, Fugu and Fugu-Ultra, that spawn Claude, Gemini, and GPT agents

GPT-5.6 Lands in Limbo: OpenAI previewed three GPT-5.6 Models (Sol, Terra, and Luna), wider release coming soon

How We Decide What Courses to Teach: The AI world is full of hype and sales pitches. DeepLearning.AI focuses on most important tools and techniques in ways you can apply to any AI vendor’s ecosystem.

OpenAI's GPT-5.6 Family, New Ways to Train Robots, Models Invoking Models

Subscribe to The Batch