AI Agents

30 Posts

Diagram showing AlphaEvolve’s agentic loop where LLMs iteratively improve code based on human-defined goals and evaluations.
AI Agents

Agentic System for Harder Problems: Google’s AlphaEvolve uses LLMs and evolutionary code to solve complex math and speed up Gemini training

LLMs can struggle with difficult algorithmic or scientific challenges when asked to solve them in a single attempt. An agentic workflow improved one-shot performance on hard problems both theoretical and practical.
Bar chart comparing LLMs like Kimi-K2, GPT-4.1, Claude, and Gemini on coding, tool use, and math benchmarks.
AI Agents

Born To Be Agentic: Moonshot releases Kimi K2, a trillion-parameter model fine-tuned for agentic tool use

An agent’s performance depends not only on an effective workflow but also on a large language model that excels at agentic activities. A new open-weights model focuses on those capabilities.
Puzzle with logos of Google, Windsurf, Cognition AI, and OpenAI symbolizing shifts in AI coding tool partnerships.
AI Agents

Powers Realign in AI-Assisted Coding: Google, Cognition carve up Windsurf after OpenAI’s failed $3B acquisition bid

A $3 billion bid by OpenAI to acquire Windsurf, maker of the AI-assisted integrated development environment of the same name, collapsed at the 11th hour, setting off a tumultuous few days of corporate maneuvering.
Chart showing failure modes in multi-agent systems, grouped into failures of specification, coordination, and task verification.
AI Agents

More Robust Multi-Agent Systems: Researchers improve multi-agent systems by studying how they tend to fail

Researchers addressed weaknesses in existing multi-agent frameworks. Their systems achieved scientific and technical breakthroughs.
An automated data-generation pipeline for producing web-agent training data. LLMs generate browser-based tasks, then attempt to execute them, and evaluate the results.
AI Agents

Generated Data for Training Web Agents: Researchers scale up production of training data for web agents

Developing an agent that navigates the web can involve a lot of human effort spent annotating training examples to fine-tune the agent’s LLM component. Scientists automated the production of data that fine-tuned LLMs effectively for web tasks.
Illustration of AI replacing office workers, showing a funnel of blue figures being filtered into people at desks using laptops.
AI Agents

CEOs Look to AI to Replace Workers: Amazon and other major companies say AI will reduce their corporate workforces

Leaders at some of the biggest U.S. corporations say they’re preparing for AI to eliminate many jobs within their organizations.
Biomni AI agent analyzes oncogenic pathways using genomics tools like Scanpy and CellxGene.
AI Agents

A Research Agent for All Biology: Biomni, an AI agent for multidisciplinary biology research

An agent designed for broad biological research could accelerate the work of scientists in specialties from anatomy to zoology.
Meta and Scale AI partnership logo symbolizing Meta’s $14.3B investment to boost its AI capabilities.
AI Agents

Meta Befriends Scale AI: Meta invests $14.3 billion in Scale AI, hires CEO Alexandr Wang

Meta hired the leadership of ScaleAI and put billions into the data-labeling startup to accelerate its AI efforts.
Diagram showing how a language model agent gets misled by malicious posts and sites when searching for Nike shoes online.
AI Agents

Phishing for Agents: Columbia University researchers show how to trick trusting AI agents with poisoned links

Researchers identified a simple way to mislead autonomous agents based on large language models.
Side-by-side of a fern leaf and its digital code representation, illustrating nature's pattern-to-code transformation.
AI Agents

Google I/O Overdrive: Google’s new AI offerings include Veo 3 video generator, lightweight Gemma 3n, updates to Gemini Pro and Ultra, and more

Google revamped its roster of models, closed and open, and added more AI-powered features to its existing products.
AI model performance comparison chart: Claude Opus 4, Sonnet 4, Sonnet 3.7, OpenAI o3, GPT-4.1, and Gemini 2.5 Pro.
AI Agents

Claude 4 Advances Code Generation: Anthropic debuts new Claude 4 Sonnet and Claude 4 Opus models, featuring top benchmarks in coding

Anthropic continued its tradition of building AI models that raise the bar in coding tasks.
Chat interface discussing code error with special character filenames. Terminal shows Unix commands for troubleshooting.
AI Agents

Your Robot Dev Team: OpenAI introduces Codex, a multi-agent cloud-based software engineering tool in ChatGPT

OpenAI launched an agentic software-development system.
CB Insights AI 100 2025 infographic showing top AI startups across sectors like healthcare, robotics, and infrastructure.
AI Agents

Up-and-Coming Startups: AI agents and infrastructure dominate CB Insights’ Top 100 AI Startups list

AI agents and infrastructure made a strong showing on CB Insights’s latest list of the top 100 AI startups.
Diagram of Modal Context Protocol showing MCP client-server architecture, APIs, and local and remote data sources.
AI Agents

Open Standard for Tool Use and Data Access Gains Momentum: OpenAI adopts Model Context Protocol to boost LLM tool integration

OpenAI embraced Model Context Protocol, providing powerful support for an open standard that connects large language models to tools and data.
AI co-scientist workflow diagram showing a research goal assigned to specialized AI agents for hypothesis testing and ranking
AI Agents

Science Research Proposals Made to Order: AI Co-Scientist, an agent that generates research hypotheses, aiding drug discovery

An AI agent synthesizes novel scientific research hypotheses. It's already making an impact in biomedicine.
Load More

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox