AI Agents

75 Posts

Flowchart depicting LLMs memorizing and responding to state media, affecting language-specific outputs.
AI Agents

Qwen3.7-Max Adds Speed and Power: Alibaba's latest proprietary model challenges U.S. rivals

Alibaba updated its flagship large language model for long-running agentic work, pushing it into the top rank among LLMs built in China.
Doughnut chart shows 77% of agentic traffic in 2025 went to product search pages.
AI Agents

Agents Surf the AI-Written Web: Internet traffic driven by AI rripled last year, study shows

AI-driven activity on the internet rose sharply last year, a study shows.
Gemini 3.5 Flash shows improved performance, surpassing previous model scores in most benchmarks.
AI Agents

Gemini 3.5 Flash Pairs Smarts With Speed: Google's updated Flash levels up, approaching top models but raising prices

Google’s faster model brings substantive gains at a substantially higher price, part of a rising trend in prices per token.
The chart compares AI benchmark efforts with employment and capital in U.S. job sectors, highlighting discrepancies.
AI Agents

Toward Agent Benchmarks That Reflect Human Work: AI agents may not be getting better at full range of economically valuable labor

AI agents seem to be increasingly capable of performing economically valuable tasks, but current benchmarks measure this capability only narrowly.
A woman in martial arts attire faces off against a cartoon lobster in a futuristic cityscape.
AI Agents

Hermes Agent Challenges OpenClaw: OpenClaw created a class of personal agents; upstart Hermes Agent is outworking it

OpenClaw, the immensely popular AI agent, has fast-rising competition.
Vibrant dragon with brush poised, signifies China's decisive action in blocking the tech acquisition.
AI Agents

China Nixes Meta-Manus Tie-Up: State regulators block acquisition of an agentic startup headquarted in Singapore

China shut down Meta’s attempt to acquire agentic technology that originated within its borders, a blow to further technical interchange and investment between China and the U.S. 
Graph depicts GPT-Realtime-2's performance across sectors, competing with other speech-to-speech models.
AI Agents

OpenAI Challenges Speech-to-Speech Leaders: RealTime API updates audio models that reason, transcribe, and translate

An update of OpenAI’s speech-to-speech model lets developers tune the tradeoff between speed and reasoning.
Graphs compare human and LLM performance strategies in rock-paper-scissors, highlighted by stars.
AI Agents

Strategic Thinking in LLMs vs. Humans: Researchers at UT-Austin and Google model human decision-making in Rock-Paper-Scissors

While large language models can behave in human-like ways, the similarities are superficial. A simple strategy game revealed clear differences in their strategic approaches.
Table highlights Kimi K2.6's dominance in agentic tasks with 86.3 and coding at 58.6, surpassing other models.
AI Agents

Kimi K2.6 Challenges Open-Weights Champs: Kimi K2.6 matches open Qwen3.6 Max andDeepSeek V4, falls just behind top closed models.

Moonshot AI’s updated Kimi model handles longer autonomous coding sessions and scales up its multi-agent orchestration relative to its predecessor.
GLM-5.1 excels in SWE-Bench Pro and Terminal-Bench 2.0, leading in coding and reasoning tests.
AI Agents

GLM-5.1 Aims for Long-Running Tasks: Z.ai’s GLM 5.1 evaluates interim results and may change its approach hundreds of times before it delivers final output

Z.ai updated its flagship open-weights large language model to work autonomously on single tasks for up to eight hours.
A black box with a red symbol is open, revealing a glowing interior, symbolizing a security breach.
AI Agents

Inside Claude Code: Claude Code’s source code leaked, exposing potential future features Kairos and autoDream

The inner workings of the popular coding agent Claude Code are available for all to see.
Top graph (blue) shows GPT-5 score drop; bottom graph (orange) shows RLM maintaining higher scores.
AI Agents

Context As An External Variable: Recursive Language Models offer path to aramatically expand beyond the context window

When processing long contexts, large language models often lose track of details or devolve into nonsense. Researchers reduced these effects by managing context externally.
Repeated neon OpenAI logos on a dark, starry backdrop, enclosed by a light-blue cloud frame.
AI Agents

OpenAI Tracks Agent States on AWS: OpenAI’s deal with Amazon to build a stateful runtime environment for AI agents

OpenAI partnered with Amazon to build infrastructure for agents on the world’s largest cloud platform, a further sign that its close relationship with Microsoft is weakening.
Diagram depicts a math problem-solving workflow from problem generation to verification and revision.
AI Agents

Agent Solves Stubborn Math Problems: Google’s Aletheia uses Gemini 3 Deep Think to find original mathematics solutions

LLMs have achieved gold-medal performance in math competitions. An agentic system showed strength in mathematical research as well.
Cursor hovers over a button labeled "Submit" on a platform showing task ratings and a typed approval note.
AI Agents

Management for Agents: OpenAI’s Frontier agent insights and orchestration platform launches to select customers

Managers need to understand how their subordinates get work done, what resources they require, and what they accomplish. OpenAI’s latest product aims to fulfill this need when the teammates are AI agents.
Load More

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox