AI Agents

22 Posts

Diagram showing how a language model agent gets misled by malicious posts and sites when searching for Nike shoes online.
AI Agents

Phishing for Agents: Columbia University researchers show how to trick trusting AI agents with poisoned links

Researchers identified a simple way to mislead autonomous agents based on large language models.
Side-by-side of a fern leaf and its digital code representation, illustrating nature's pattern-to-code transformation.
AI Agents

Google I/O Overdrive: Google’s new AI offerings include Veo 3 video generator, lightweight Gemma 3n, updates to Gemini Pro and Ultra, and more

Google revamped its roster of models, closed and open, and added more AI-powered features to its existing products.
AI model performance comparison chart: Claude Opus 4, Sonnet 4, Sonnet 3.7, OpenAI o3, GPT-4.1, and Gemini 2.5 Pro.
AI Agents

Claude 4 Advances Code Generation: Anthropic debuts new Claude 4 Sonnet and Claude 4 Opus models, featuring top benchmarks in coding

Anthropic continued its tradition of building AI models that raise the bar in coding tasks.
Chat interface discussing code error with special character filenames. Terminal shows Unix commands for troubleshooting.
AI Agents

Your Robot Dev Team: OpenAI introduces Codex, a multi-agent cloud-based software engineering tool in ChatGPT

OpenAI launched an agentic software-development system.
CB Insights AI 100 2025 infographic showing top AI startups across sectors like healthcare, robotics, and infrastructure.
AI Agents

Up-and-Coming Startups: AI agents and infrastructure dominate CB Insights’ Top 100 AI Startups list

AI agents and infrastructure made a strong showing on CB Insights’s latest list of the top 100 AI startups.
Diagram of Modal Context Protocol showing MCP client-server architecture, APIs, and local and remote data sources.
AI Agents

Open Standard for Tool Use and Data Access Gains Momentum: OpenAI adopts Model Context Protocol to boost LLM tool integration

OpenAI embraced Model Context Protocol, providing powerful support for an open standard that connects large language models to tools and data.
AI co-scientist workflow diagram showing a research goal assigned to specialized AI agents for hypothesis testing and ranking
AI Agents

Science Research Proposals Made to Order: AI Co-Scientist, an agent that generates research hypotheses, aiding drug discovery

An AI agent synthesizes novel scientific research hypotheses. It's already making an impact in biomedicine.
Table comparing Claude 3.7, 3.5, o1, o3-mini, DeepSeek R1, and Grok 3 Beta on reasoning, coding, tools, visuals, and math.
AI Agents

Budget for Reasoning to the Token: Claude 3.7 Sonnet adds extended thinking mode

Anthropic’s Claude 3.7 Sonnet implements a hybrid reasoning approach that lets users decide how much thinking they want the model to do before it renders a response.
A person typing a prompt in an AI-powered mobile app with a button to improve the input.
AI Agents

Mobile Apps to Order: Replit’s agent-powered mobile app expands to full app development

Replit, an AI-driven integrated development environment, updated its mobile app to generate further mobile apps to order.
Diagram showing GPT-4o with and without search, highlighting task execution success and failure.
AI Agents

Tree Search for Web Agents: How tree search improves AI agents’ ability to browse the web and complete tasks

Browsing the web to achieve a specific goal can be challenging for agents based on large language models and even for vision-language models that can process onscreen images of a browser.
ChatGPT interface drafting a research report on retail trends, including AI, e-commerce, and inflation.
AI Agents

Agents Go Deep: OpenAI’s Deep Research agent generates detailed reports by analyzing web sources

OpenAI introduced a state-of-the-art agent that produces research reports by scouring the web and reasoning over what it finds.
Diagram illustrating Moshi’s use of an LLM to process user audio input, inner monologue, and output.
AI Agents

Okay, But Please Don’t Stop Talking: Moshi, an open alternative to OpenAI’s Realtime API for Speech

Even cutting-edge, end-to-end, speech-to-speech systems like ChatGPT’s Advanced Voice Mode tend to get interrupted by interjections like “I see” and “uh-huh” that keep human conversations going. Researchers built an open alternative that’s designed to go with the flow of overlapping speech.
Flowchart illustrating the automation of opening, editing, and saving a Word document using PyAutoGUI.
AI Agents

Training for Computer Use: UI-TARS shows strong computer use capabilities in benchmarks

As Anthropic, Google, OpenAI, and others roll out agents that are capable of computer use, new work shows how underlying models can be trained to do this.
AI assistant processes ‘Find me a family-friendly campsite’ and suggests options.
AI Agents

Computer Use Gains Momentum: OpenAI’s Operator automates online tasks with a new AI agent

OpenAI introduced an AI agent that performs simple web tasks on a user’s behalf.
MUSTAFA SULEYMAN
AI Agents

Mustafa Suleyman: Agents of action

In 2025, AI will have learned to see, it will be way smarter and more accurate, and it will start to do things on your behalf.
Load More

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox