All about Claude’s new Opus and Sonnet Google’s subscriber-first features and models from I/O

Data Points

Published

May 23, 2025

Reading time

4 min read

In today’s edition, you’ll learn more about:

The new device OpenAI is building with Jony Ive
Mistral’s new open-weight software engineering model
Falcon-Arabic, a new 7B model that excels in multiple regional dialects
Google’s new multimodal model for mobile devices

But first:

Anthropic introduces Claude Opus 4 and Sonnet 4 models

Anthropic released its new Claude Opus 4 and Sonnet 4 models with improvements in coding and reasoning capabilities. Opus 4 reached 72.5 percent on SWE-bench and 43.2 percent on Terminal-bench coding tests, while Sonnet 4 achieved 72.7 percent on SWE-bench, outperforming earlier Claude models and rivals from OpenAI. Both new models can now use tools during their reasoning process, execute tools in parallel, and demonstrate better memory when accessing local files. The models are available through Anthropic, Amazon Bedrock, and Google Cloud’s Vertex AI, with Opus 4 priced at $15/$75 per million tokens (input/output) and Sonnet 4 at $3/$15. (Anthropic)

Google rebrands subscription to AI Pro, launches new Ultra tier

Google is renaming its AI Premium subscription to “Google AI Pro” while introducing a new high-end “Google AI Ultra” tier priced at $249.99 per month. Google AI Pro maintains its $19.99 monthly price with access to Gemini 2.5 Pro, 2TB storage, Deep Research, and Veo 2 video generation, plus new features like early access to Gemini in desktop Chrome and the Flow AI filmmaking tool. The Ultra tier includes all Pro features plus 30TB storage, YouTube Premium, highest usage limits for AI tools, and exclusive access to experimental features like Project Mariner, which can manage multiple tasks simultaneously. Google is offering an introductory price of $124.99 for Ultra’s first three months, with availability starting today in the U.S. and expanding to other countries soon. (9to5Google)

OpenAI partners with Jony Ive on AI assistant device

Sam Altman revealed to OpenAI staff that the company is developing AI “companions” with newly acquired design firm io, led by former Apple designer Jony Ive. The planned device will be aware of users’ surroundings, unobtrusive enough to fit in a pocket or on a desk, and is intended to become a third essential device alongside laptops and smartphones. Altman described the product as a “family of devices” that will integrate hardware and software similar to Apple’s approach, emphasizing that the technology will move beyond typing queries into websites. OpenAI aims to ship 100 million devices by late next year, with Altman suggesting the $6.5 billion acquisition could add $1 trillion in value to the company. (The Verge)

Mistral AI releases Devstral, an open-weight coding LLM

Mistral AI and All Hands AI launched Devstral, an agentic large language model specifically designed for software engineering tasks. The model achieves 46.8 percent on SWE-Bench Verified, outperforming other open-weight models by more than 6 percentage points and surpassing GPT-4.1-mini by over 20 percent. Unlike many LLMs that excel at isolated coding tasks, Mistral says Devstral can solve more complex software engineering problems by contextualizing code within large codebases and identifying relationships between components. The model is lightweight enough to run on a single RTX 4090 or a Mac with 32GB RAM, making it suitable for local deployment. Devstral is available for free under the Apache 2.0 license on HuggingFace, Ollama, and other platforms, or through Mistral’s API at $0.10/$0.30 per million tokens of input/output. (Mistral)

New Arabic language model outperforms larger competitors

The Technology Innovation Institute released Falcon-Arabic, a 7B parameter language model built on the Falcon 3 architecture. The model handles Arabic, English, and several other languages with a 32,000 token context window. Testing shows Falcon-Arabic outperforms other Arabic language models of similar size and some larger models on benchmarks including Arabic MMLU, Exams, MadinahQA, and Aratrust. The developers extended the base model with 32,000 Arabic-specific tokens and used native Arabic datasets for training rather than translated content. The model supports both Modern Standard Arabic and regional dialects, addressing the relative scarcity of Arabic language AI tools. Users can test Falcon-Arabic through an online playground. (Hugging Face)

Google previews Gemma 3n, a mobile-optimized multimodal model

Google unveiled Gemma 3n, a new open AI model specifically engineered for on-device use with a significantly reduced memory footprint. The model leverages per-layer embeddings technology that allows 5B and 8B parameter models to operate with just 2GB and 3GB of memory, making them suitable for phones, tablets, and laptops. Gemma 3n offers multimodal capabilities including text, image, video, and audio processing, with new features like automatic speech recognition and translation. The model was designed in collaboration with mobile hardware companies like Samsung, Qualcomm, and MediaTek to enable offline use, and will be the basis for the next version of Gemini Nano. Developers can preview Gemma 3n through Google AI Studio or Google AI Edge for on-device development. (Google)

Still want to know more about what matters in AI right now?

Read this week’s issue of The Batch for in-depth analysis of news and research.

This week, Andrew Ng shared how large companies can move fast in the age of AI by creating sandbox environments that allow small teams to innovate without needing constant permission.

“If engineers need sign-off from 5 vice presidents before they’re even allowed to launch an MVP (minimum viable product) to run an experiment, how can they ever discover what customers want, iterate quickly, or invent any meaningful new product?”

Read Andrew’s full letter here.

Other top AI news and research stories we covered in depth: OpenAI introduced Codex, a new multi-agent, cloud-based software engineering tool integrated into ChatGPT; xAI attributed the controversial “white genocide” responses from Grok to an unnamed, unauthorized employee, raising concerns about internal safeguards; U.S. tech giants including Nvidia, AMD, and Amazon secured deals to supply chips and infrastructure to Middle Eastern companies like Saudi Arabia’s Humain and the UAE’s G42; and Microsoft researchers showed that 4-bit quantized versions of Llama models can match the accuracy of 16-bit models, offering major efficiency gains without compromising performance.

Subscribe to Data Points