AI Agents

62 Posts

Diagram depicts a math problem-solving workflow from problem generation to verification and revision.
AI Agents

Agent Solves Stubborn Math Problems: Google’s Aletheia uses Gemini 3 Deep Think to find original mathematics solutions

LLMs have achieved gold-medal performance in math competitions. An agentic system showed strength in mathematical research as well.
Cursor hovers over a button labeled "Submit" on a platform showing task ratings and a typed approval note.
AI Agents

Management for Agents: OpenAI’s Frontier agent insights and orchestration platform launches to select customers

Managers need to understand how their subordinates get work done, what resources they require, and what they accomplish. OpenAI’s latest product aims to fulfill this need when the teammates are AI agents.
A performance table shows Claude Opus 4.6 outperforming competitors in terminal coding, computer use, tool use, search, and problem-solving.
AI Agents

Claude Opus 4.6 Reasons More Over Harder Problems: Anthropic updates flagship model, places first on Intelligence Index

Anthropic updated its flagship large language model to handle longer, more complex agentic tasks.
Chart showing AI system audit assurance levels ranging from limited to very high, with increasing access and more comprehensive assessments.
AI Agents

Toward Consistent Auditing of AI: OpenAI alumni found Averi to set standards for AI model audits

AI is becoming ubiquitous, yet no standards exist for auditing its safety and security to make sure AI systems don’t assist, say, hackers or terrorists. A new organization aims to change that.
Flowchart showing Dr. CaBot's AI reasoning process for diagnosing and suggesting medical tests.
AI Agents

More Robust Medical Diagnoses: Inside Dr Cabot, an agent trained to diagnose complex conditions

AI models that diagnose illnesses typically generate diagnoses based on descriptions of symptoms. In practice, though, doctors must be able to explain their reasoning and plan next steps. Researchers built a system that accomplishes these tasks.
Flowchart showing Kimi K2.5 AI orchestrating tasks among various specialized subagents.
AI Agents

Kimi K2.5 Creates Its Own Workforce: Moonshot AI takes the open model crown with vision updates, aided by subagents

An open source vision-language model unleashes minion agents that enable it to perform tasks more quickly and effectively.
A post on a forum titled "Can my human legally fire me for refusing unethical requests?"
AI Agents

Agents Unleashed: Cutting through the OpenClaw and Moltbook hype

The OpenClaw open-source AI agent became a sudden sensation, inspiring excitement, worry, and hype about the agentic future.
UCP diagram outlines processes, from product discovery to identity linking and order management.
AI Agents

Shopping Protocols for AI Agents: Google’s open-source UCP (Univeral Commerce Protocol) standardizes AI transactions

Google introduced an open-source protocol designed to enable AI agents to help consumers make purchases online, from finding items to returning them if necessary.
Apple logo side by side with Google's logo, symbolizing their AI partnership.
AI Agents

Apple’s Foundation Models Will Be Gemini: Apple announced a partnership with Google to power Siri and other AI features

Apple cut a multi-year deal with Google to use Gemini models as the basis of AI models that reside on Apple devices.
View from a car on a tree-lined street, with an overlay instructing to decelerate if hazards are detected.
AI Agents

Training Cars to Reason: Nvidia’s Alpamayo-R1 is a robotics-style reasoning model for autonomous vehicles

Chain-of-thought reasoning can help autonomous vehicles decide what to do next.
Meta's infinity loop logo adorned with a snapping hand icon on a light gradient background.
AI Agents

Meta Moves to Buy Agent Tech: Meta strikes a deal to acquire Manus, a Singapore-based agentic AI startup with Chinese origins

A high-profile acquisition could enable Facebook, Instagram, and WhatsApp to offer built-in agents that do users’ bidding.
Graph with 10 colored lines shows topic ranks monthly, based on a Microsoft study of Copilot usage.
AI Agents

Copilot’s Users Change Hour to Hour: Microsoft study shows people use AI very differently at different times or on different devices

What do users want from AI? The answer depends on when and how they use it, a new study shows.
Diagram showing SCP hub linking clients with databases, tools, AI agents, and lab devices for experiments.
AI Agents

Lingua Franca for Science Labs: SAIL’s Science Context Protocol helps AI Agents communicate about local and virtual experiments

An open protocol aims to enable AI agents to conduct scientific research autonomously across disciplinary and institutional boundaries.
Tanmay Gupta is pictured smiling next to a whiteboard filled with mathematical formulas, embodying active AI engagement.
AI Agents

From Prediction to Action by Tanmay Gupta: Tanmay Gupta of the Allen Institute on building AI for long-horizon tasks

AI research in 2026 should confront a simple but transformative realization: Models that predict are not the same as systems that act. The latter is what we actually need.
Mice on a laptop keyboard explore, with code on screen; background features festive lights, presents.
AI Agents

Agents Write Code Faster, Cheaper: Software developers used more versatile AI-powered tools to write code

Coding apps moved beyond autofill-style code completion to agentic systems that manage a wide range of software development tasks.
Load More

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox