AI Agents

44 Posts

Image illustrates the Self-Search method, simulating web searches to improve model accuracy in tests.
AI Agents

More-Efficient Agentic Search: Researchers fine-tune models to search their own parameters to boost recall

Large language models may have learned knowledge that’s relevant to a given prompt, but they don’t always recall it consistently. Fine-tuning a model to search its parameters as though it were searching the web can help it find knowledge in its own weights.
Visual map outlines cybercrime operation phases, highlighting AI-driven processes and human validation steps.
AI Agents

Anthropic Cyberattack Report Sparks Controversy: Security researchers question whether coding agents allow unprecedented automated attacks

Independent cybersecurity researchers pushed back on a report by Anthropic that claimed hackers had used its Claude Code agentic coding system to perpetrate an unprecedented automated cyberattack.
Chart highlights Kimi K2’s top performance in agentic tasks, outperforming rivals in reasoning and coding.
AI Agents

Top Agentic Results, Open Weights: Kimi K2 Thinking outperforms proprietary models with new techniques for agentic tool use

The latest open-weights large language model from Moonshot AI challenges top proprietary LLMs at agentic tasks by executing hundreds of tool calls sequentially and pausing to think between each.
Chart displays MiniMax-M2 with high intelligence and competitive pricing, outshining other models.
AI Agents

Open-Weights Coding Leader: MiniMax-M2’s lightweight footprint and low costs belie that its top performance

An open-weights model from Shanghai-based MiniMax challenges top proprietary models on key benchmarks for coding and agentic tasks.
Flowchart details GEPA algorithm, featuring candidate filtering and performance improvement loops.
AI Agents

Better Agentic Prompts Automatically: Authors devised GEPA, an algorithm for better prompts to improve agentic systems’ performance

Honing an agent’s prompt can yield better results than fine-tuning the underlying large language model via reinforcement learning.
Comparison table highlighting Claude Sonnet 4.5's top scores in coding and reasoning benchmarks, featuring improved capabilities.
AI Agents

Claude Levels Up: Anthropic launches Claude Sonnet 4.5 and the Claude Agent SDK, and overhauls Claude Code for developers

Anthropic updated its mid-size Claude Sonnet model, making it the first member of the Claude family to reach version 4.5. It also enhanced the Claude Code agentic coding tool with long-desired features.
FanDuel mobile app screens showing live betting odds and AI-powered sports wagering tools.
AI Agents

Sports Betting Goes Agentic: Gambling sites roll out AI tools that predict wins and track bets for sports fans

AI agents are getting in on the action of online sports gambling.
Central AI agent icon links to merchant, cart, and payment symbols, illustrating agentic payments process.
AI Agents

Agents of Commerce: Google’s AP2 gives developers new tools to build agentic payments

Google launched an open protocol for agentic payments that enables agents based on any large language model to purchase items over the internet.
Swarms of military drones flying in formation over a field, showing autonomous AI-driven coordination in warfare.
AI Agents

Drone Swarms Go to War: Ukraine experiments with small groups of low-contact, high-autonomy drones that strike on initiative

Swarms of drones that coordinate with one another autonomously have become a battlefield staple in Ukraine.
Students at Alpha School working on laptops during AI-assisted personalized learning session.
AI Agents

2 Hours With AI Versus 6 With Teacher: Inside Alpha School, a Texas-based program using algorithms and video monitors to teach children

A growing private school system replaces the typical 6-hour school day with 2 hours of personalized, AI-assisted education. 
Charts showing PromptGuard 2 blocking attacks, AlignmentCheck detecting goal hijacking, and CodeShield finding insecure code.
AI Agents

Cybersecurity for Agents: Meta releases LlamaFirewall, an open-source defense against AI hijacking

Autonomous agents built on large language models introduce distinct security concerns. Researchers designed a system to protect agents from common vulnerabilities.
Hand holding a Pixel 10 smartphone using Google’s Magic Cue AI assistant, which suggests a location reply in a text message conversation.
AI Agents

Proactive AI Assistance for Phones: Inside Magic Cue, Google’s new AI assistant for Pixel 10

Google’s latest smartphone sports an AI assistant that anticipates the user’s needs and presents helpful information without prompting.
Diagram showing SWE-Smith AI pipeline for generating synthetic coding tasks from real repositories using multiple strategies
AI Agents

Training Data for Coding Assistants: Stanford and Alibaba build bug fixing dataset and pipeline to train AI

A bottleneck in fine-tuning large language models for software engineering is building a dataset that can show them how to edit code, search for subroutines, write test scripts, control a terminal, manage a file system, and so on. Researchers built a pipeline that produces such data automatically.
A fully autonomous surgical robot clips and cuts a bile duct in an ex-vivo gallbladder removal experiment using the da Vinci system, guided by AI.
AI Agents

Robot Surgeon Cuts and Clips: Doctors at Stanford, Johns Hopkins, and Optosurgical operate on animal organs without human intervention

An autonomous robot performed intricate surgical operations without human intervention.
Diagram showing AlphaEvolve’s agentic loop where LLMs iteratively improve code based on human-defined goals and evaluations.
AI Agents

Agentic System for Harder Problems: Google’s AlphaEvolve uses LLMs and evolutionary code to solve complex math and speed up Gemini training

LLMs can struggle with difficult algorithmic or scientific challenges when asked to solve them in a single attempt. An agentic workflow improved one-shot performance on hard problems both theoretical and practical.
Load More

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox