Oct 22, 2025

6 Posts

Robot bakes pizza at 1000 degrees for 5 hours, causing a fire, illustrating mistake in error analysis.
Oct 22, 2025

Improve Agentic Performance with Evals and Error Analysis, Part 2: Best practices for error analysis in agentic AI development, and how LLMs make them easier

In last week’s letter, I explained how effective agentic AI development needs a disciplined evals and error analysis process, and described an approach to performing evals.
Robot bakes pizza at 1000 degrees for 5 hours, causing a fire, illustrating mistake in error analysis.
Oct 22, 2025

Ling-1T Leads Non-Reasoning Performance, MCP Poses Security Risks, California Regulates AI, Auto-Tune for Agentic Prompts

The Batch AI News and Insights: In last week’s letter, I explained how effective agentic AI development needs a disciplined evals and error analysis process, and described an approach to performing evals.
Flowchart details GEPA algorithm, featuring candidate filtering and performance improvement loops.
Oct 22, 2025

Better Agentic Prompts Automatically: Authors devised GEPA, an algorithm for better prompts to improve agentic systems’ performance

Honing an agent’s prompt can yield better results than fine-tuning the underlying large language model via reinforcement learning.
Official letterhead displays Governor Newsom's signature on AI regulation bill establishing state oversight.
Oct 22, 2025

California Builds AI Regulatory Regime: The U.S.’s biggest state by population and economy passed four AI transparency bills is less than one month

In the absence of national laws that specifically regulate AI in the United States, California moved to regulate the technology within its own borders, passing four bills in less than a month.
Graph showing increasing security risks from 9% to 92% as MCP servers rise from 1 to 10.
Oct 22, 2025

MCP Poses Security Risks: Experts identify holes in the popular Model Context Protocol for attackers to access data

The ability to easily connect large language models to tools and data sources has made Model Context Protocol popular among developers, but it also opens security holes, research shows.
A performance comparison table highlights Ling-1T's success in reasoning and coding tasks against rivals.
Oct 22, 2025

Reasoning Without “Thinking”: All about Ant Group’s Ling-1T, an open, non-reasoning model that outperforms closed competitors

Reasoning models typically learn to undertake a separate process of “thinking” through their output of before they produce final response. Ant Group built a top non-reasoning model that can take similar steps as part of its immediate response.

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox