Claude’s Haiku Boasts Top Performance, Fast Open speech recognition models top new leaderboard

Data Points

Published

Oct 17, 2025

Reading time

4 min read

In today’s edition of Data Points, you’ll learn more about:

Alibaba’s small, edge-optimized vision-language models
Microsoft’s new image generator
GitHub’s free kit for spec-driven development
ChatGPT’s new automated memory manager

But first:

Anthropic updates Claude’s Haiku small model to 4.5

Anthropic claims the new Haiku 4.5 performs coding tasks at levels comparable to Claude Sonnet 4 from five months ago while costing one-third as much and running more than twice as fast. The model outperforms Sonnet 4 in areas like computer use, basic math, and agentic coding. The release shows how AI capabilities that were recently considered advanced are becoming cheaper and faster to deploy, making them better suited for agentic applications. Claude Haiku 4.5 costs $1 per million input tokens and $5 per million output tokens, and is available through the Claude API, Amazon Bedrock, and Google Cloud Vertex AI. (Anthropic)

New leaderboard benchmarks speech recognition systems

Researchers at Hugging Face launched the Open ASR Leaderboard, a reproducible benchmark that evaluates over 60 open-source and proprietary automatic speech recognition systems across 11 datasets, including multilingual transcription and long-form audio. The benchmark reports both word error rate (WER) and inverse real-time factor (RTFx), enabling fair comparisons of both accuracy and processing speed. For English transcription, conformer encoders paired with large language model decoders achieved the best average WER but processed audio more slowly, while CTC and TDT decoders delivered significantly better speed — up to 6,400 times faster than real-time — making them more practical for long-form and offline transcription. Nvidia’s open Canary and Parakeet models top the leaderboard for accuracy and speed respectively; surprisingly, proprietary models tend to trail open models on both benchmarks. (arXiv and Hugging Face)

Alibaba releases smaller and quantized vision-language models

Alibaba’s Qwen team released Qwen3-VL models at 4 billion and 8 billion parameter scales, each available in Instruct and Thinking variants, plus FP8-quantized versions for low-VRAM deployment. The models retain most of the capabilities of larger Qwen3-VL releases, from context window to GUI agent control. The FP8 checkpoints deliver near-BF16 performance, though Transformers does not yet support direct loading—deployment requires vLLM or SGLang. The release complements Qwen’s existing 30B and 235B mixture-of-experts tiers with smaller models suitable for single-GPU and edge deployments. The models are now available under open licenses on Hugging Face and GitHub. (Hugging Face)

Microsoft releases first in-house text-to-image generation model

MAI-Image-1 debuted in the top 10 on the LMArena text-to-image leaderboard. (Currently, it’s ninth.) Microsoft says its model specializes in photorealistic imagery, including complex lighting effects and landscapes, while maintaining faster generation speeds than many larger competing models. The release follows Microsoft’s announcement of its first two in-house models in August, part of the company’s strategy to build purpose-built AI models for integration into its products. (Microsoft)

GitHub open sources Spec Kit to improve coding agent reliability

GitHub’s Spec Kit works with AI coding agents like GitHub Copilot, Claude Code, and Gemini CLI. The toolkit addresses a common problem: coding agents often produce code that appears correct but fails to work properly because developers treat them like search engines rather than literal-minded collaborators that need clear instructions. Spec Kit introduces a four-phase process (Specify, Plan, Tasks, and Implement), where specifications become editable documents that guide code generation, with built-in checkpoints for developers to verify and refine AI output at each stage. The approach proves especially useful for greenfield projects, adding features to existing codebases, and modernizing legacy systems by separating stable requirements from flexible implementation details. The toolkit is available now on GitHub. (GitHub)

ChatGPT updates memory management to prioritize by relevance

ChatGPT now automatically manages saved memories by keeping the most relevant details prioritized while moving less important information to the background, preventing accounts from reaching “memory full” status. The system determines which memories to prioritize based on recency and how frequently users discuss particular topics. Users can search their saved memories, sort them by date, manually adjust which memories are prioritized, and restore previous versions of saved memories. The feature also allows users to disable automatic memory management and view which memories are currently top of mind. As of this writing, OpenAI is rolling out the update to Plus and Pro subscribers globally on the web. (OpenAI)

Still want to know more about what matters in AI right now?

Read this week’s issue of The Batch for in-depth analysis of news and research.

This week, Andrew Ng talked about the importance of disciplined evaluation and error analysis in AI development, emphasizing that understanding root causes of errors can lead to faster progress, and introduced best practices for evaluating agentic systems.

“With generative AI, a lot of intuitions from evals and error analysis of supervised learning carry over — history doesn’t repeat itself, but it rhymes — and developers who are already familiar with machine learning and deep learning often adapt to generative AI faster than people who are starting from scratch. But one new challenge is that the space of outputs is much richer, so there are many more ways an algorithm’s output might be wrong.”

Read Andrew’s full letter here.

Other top AI news and research stories we covered in depth:

OpenAI strengthened its ties with AMD through a multi-billion dollar chip deal, providing six gigawatts of computing power and up to 10% of AMD's resources.
DeepSeek cut inference costs with DeepSeek-V3.2-Exp, which streamlines processing using a "Lightning Indexer" to boost efficiency.
Thinking Machines simplified fine-tuning with the new Tinker API, making it easier to fine-tune models on many GPUs.
MolmoAct enhanced robotic capabilities by creating spatial maps, allowing robots to plot their actions before executing text directions.

Subscribe to Data Points