Wikimedia wants to help build AI for the commons Pricing and availability for o3 and o4-mini

Data Points

Published

Apr 18, 2025

Reading time

4 min read

In today’s edition, you’ll find:

Gemini 2.5 Flash blends speed with budgeted reasoning
IBM’s Granite Speech sets SOTA in transcription accuracy
Recall will soon return to Copilot Plus PCs
OpenAI will shelve its biggest, costliest model

But first:

Wikimedia releases free-to-use Wikipedia dataset on Kaggle

Wikimedia Enterprise created a dataset designed specifically for machine learning applications that provides structured Wikipedia content in English and French. The dataset offers pre-parsed article data in JSON format, eliminating the need for developers to scrape or parse raw text when building models or testing language processing pipelines. The beta release, available now on Kaggle, includes valuable content elements like abstracts, short descriptions, infobox data, image links, and segmented article sections, all freely licensed under Creative Commons Attribution-Share-Alike 4.0 and GNU Free Documentation License. The release comes shortly after the organization revealed that Wikipedia’s hosting costs had risen sharply due to AI bots scraping its websites without permission. (Wikimedia)

OpenAI unveils smarter reasoning models with tool use

OpenAI released o3 and o4-mini, new reasoning models that can use every tool in ChatGPT’s arsenal, from web search to coding to image generation. The models show strong improvements over previous versions, with o3 setting new benchmarks in coding and math while making 20 percent fewer major errors than o1 on complex tasks. o4-mini achieves remarkable performance for its size, particularly in competition math where it scored 99.5 percent pass@1 on AIME 2025 when given access to Python. Both models are available now to ChatGPT Plus, Pro, and Team users, with Enterprise and Edu access coming next week. In the API, o4-mini costs $1.10/$4.40 per million tokens of input/output, while o3 costs $10/$40. (OpenAI)

Google previews Gemini 2.5 Flash, a fast multimodal model with controllable reasoning capabilities

Google launched an early preview of Gemini 2.5 Flash, the company’s first “hybrid” reasoning model where developers can toggle “thinking” on or off. Developers can set specific thinking budgets to balance quality, cost, and latency, with the model automatically determining how much reasoning to apply based on task complexity. The model performs strongly on complex reasoning tasks, ranking second only to Gemini 2.5 Pro on Hard Prompts in LMArena, but maintains what Google claims is the best price-to-performance ratio among comparable models. Gemini 2.5 Flash is currently available for free through the Gemini API, available in Google AI Studio and Vertex AI, with final pricing to be announced on its full release. (Google)

Granite Speech 3.3 8B is IBM’s first audio-input model

IBM released Granite Speech 3.3 8B, a compact open-weights speech-to-text model offering superior transcription accuracy compared to top competitors. The model processes both audio and text inputs, providing automatic speech recognition and translation from English to seven languages including French, Spanish, German, and Mandarin. Unlike Whisper and other conventional speech models, which are limited to 30-second windows, Granite Speech can handle audio files of arbitrary length, processing files of up to twenty minutes (although IBM still recommends one-minute chunks for superior accuracy). IBM plans improvements for future versions, including multilingual encoding, emotion detection, and speech-enabled multimodal models. (IBM)

Microsoft rolls out Recall feature to Windows Insiders

Microsoft began gradually rolling out its Recall feature in the Release Preview channel, signaling the feature will soon be widely available. Recall captures screenshots of user activity on Copilot Plus PCs, allowing users to search and find past content. The feature faced multiple delays since June 2023 due to security concerns. Microsoft emphasizes that Recall requires explicit opt-in from users, allows pausing snapshot collection at any time, and will only be available on Copilot Plus PCs. In earlier testing phases, reviewers described the feature as “creepy, clever, and compelling.” (Microsoft and The Verge)

OpenAI to discontinue GPT-4.5 API access

OpenAI announced it will end API access to GPT-4.5, its largest AI model to date, on July 14, just months after its February release. The company recommends that developers transition to the newly launched GPT-4.1, which OpenAI claims offers “similar or improved performance [to] GPT-4.5 in key areas at a much lower cost.” While GPT-4.5 will remain available in ChatGPT for paying customers, its high operational costs likely influenced the decision to remove it from the API. The model, code-named Orion, was trained with unprecedented computing resources but falls short of “frontier model” status on several industry benchmarks, despite improvements in writing and persuasiveness over GPT-4o. (TechCrunch)

Still want to know more about what matters in AI right now?

Read this week’s issue of The Batch for in-depth analysis of news and research.

This week, Andrew Ng shared why teams should start building evaluations early — even if they’re quick and imperfect — and improve them over time to accelerate GenAI development.

“I encourage teams to think of building evals as an iterative process. It’s okay to start with a quick-and-dirty implementation (say, 5 examples with unoptimized metrics) and then iterate and improve over time.”

Read Andrew’s full letter here.

Other top AI news and research stories we covered in depth: Google unveiled Gemini 2.5 Pro Experimental, which outperforms top AI models and continues the rapid evolution of its flagship model family; Model Context Protocol (MCP), an open standard for tool use and data access, gained traction as OpenAI adopted it to improve LLM integration with external tools and APIs; a book excerpt explored Sam Altman’s brief ouster and return to OpenAI, shedding light on the company’s internal power struggles; and researchers introduced a new byte-based model that surpasses Llama 3 and other token-based models on tasks involving misspellings, noisy input, and translation.

Subscribe to Data Points