Apr 16, 2025

6 Posts

Cartoon of two coworkers coding; one struggles with evaluations, the other iterates quickly through model updates and test cases.

Apr 16, 2025

We Iterate on Models. We Can Iterate on Evals, Too: Building automated evals doesn’t need to be a huge investment. Start with a few quick-and-dirty examples and iterate!

I’ve noticed that many GenAI application projects put in automated evaluations (evals) of the system’s output probably later — and rely on humans to manually examine and judge outputs longer — than they should.

Apr 16, 2025

Google Unveils Gemini 2.5, MCP Gains Momentum, Behind Sam Altman’s Fall and Rise, LLMs That Understand Misspellings

The Batch AI News and Insights: I’ve noticed that many GenAI application projects put in automated evaluations (evals) of the system’s output probably later — and rely on humans to manually examine and judge outputs longer — than they should.

Diagram of latent transformer model using byte-level encoding, patching, and cross-attention for next-byte prediction.

Apr 16, 2025

Toward LLMs That Understand Misspellings: New byte-based model beats Llama 3 on spelling, noise, and translation

Researchers built a model that’s more robust to noisy inputs like misspellings, smarter about character-level information like the number of R's in strawberry, and potentially better able to understand unfamiliar languages that might share groups of letters with familiar languages.

Illustration of a businessman in a blue suit sitting alone at the head of a long boardroom table with black chairs.

Apr 16, 2025

The Fall and Rise of Sam Altman: Inside Sam Altman’s brief ouster from OpenAI

A behind-the-scenes account provides new details about the abrupt firing and reinstatement of OpenAI CEO Sam Altman in November 2023.

Diagram of Modal Context Protocol showing MCP client-server architecture, APIs, and local and remote data sources.

Apr 16, 2025

Open Standard for Tool Use and Data Access Gains Momentum: OpenAI adopts Model Context Protocol to boost LLM tool integration

OpenAI embraced Model Context Protocol, providing powerful support for an open standard that connects large language models to tools and data.

AI benchmark comparison chart showing Gemini 2.5 Pro, GPT-4.5, Claude, Grok, and others across science, math, code, and reasoning.

Apr 16, 2025

Google Unveils Gemini 2.5: Google’s Gemini 2.5 Pro Experimental outperforms top AI models

Google’s new flagship model raised the state of the art in a variety of subjective and objective tests.

Apr 16, 2025

We Iterate on Models. We Can Iterate on Evals, Too: Building automated evals doesn’t need to be a huge investment. Start with a few quick-and-dirty examples and iterate!

Google Unveils Gemini 2.5, MCP Gains Momentum, Behind Sam Altman’s Fall and Rise, LLMs That Understand Misspellings

Toward LLMs That Understand Misspellings: New byte-based model beats Llama 3 on spelling, noise, and translation

The Fall and Rise of Sam Altman: Inside Sam Altman’s brief ouster from OpenAI

Open Standard for Tool Use and Data Access Gains Momentum: OpenAI adopts Model Context Protocol to boost LLM tool integration

Google Unveils Gemini 2.5: Google’s Gemini 2.5 Pro Experimental outperforms top AI models

Subscribe to The Batch