AI Giants Share Wikipedia’s Costs Wikimedia Foundation strikes deals with Amazon, Meta, Microsoft, Mistral AI, and Perplexity

Published
Reading time
3 min read
Lines connect multiple Wikipedia globe logos, symbolizing data exchange and partnerships.
Loading the Elevenlabs Text to Speech AudioNative Player...

On its 25th anniversary, Wikipedia celebrated with high-profile deals to make its data easier for AI companies to train their models in exchange for financial support.

What’s new: The Wikimedia foundation announced partnerships with AI companies including Amazon, Meta, Microsoft, Mistral AI, and Perplexity. The partnership program, known as Wikimedia Enterprise, lets these partners access Wikipedia data at higher speeds and volumes than they could by scraping pages on the web. Financial terms were not disclosed.

How it works: Along with donations from users, enterprise partnerships are the Wikimedia Foundation’s chief source of revenue. Wikimedia Enterprise offers APIs that enable developers to directly access encyclopedia articles and other Wikimedia data, including Wikimedia Commons images, Wiktionary’s online dictionary, and Wikidata’s machine-readable knowledge base. Free plans allow for limited data updates and access to a support portal. Paid plans (terms are not public) include daily snapshots of Wikimedia data, potentially unlimited data requests (limits vary depending on how much a subscriber pays), streaming access to real-time revisions, and technical support from human staffers. 

  • Wikipedia data is available to all under a Creative Commons license that makes it free to use for commercial and noncommercial purposes. Its free availability and high quality has made it an important data source for training AI models. The foundation also offers an open Kaggle dataset for noncommercial AI training. 
  • Wikipedia receives more requests from automated web crawlers than human users. The site’s founder Jimmy Wales said crawlers gathering data to train AI systems had caused the foundation’s hosting, memory, and server costs to skyrocket. The foundation called for AI developers to support it financially, use the API rather than crawl the web, and attribute information derived from Wikipedia articles.
  • Microsoft, Mistral AI, and Perplexity all signed up as enterprise partners within the last year. Wikimedia’s existing partnerships with Amazon and Meta had not previously been announced. Google became a Wikimedia Enterprise partner in 2022.
  • Wikimedia also announced partnerships with some smaller companies, each of which advertise their environmentally-friendly approach: Ecosia (a search engine company), Pleias (an LLM builder), and ProRata (an AI search, advertising, and attribution engine). 

Behind the news: Other publishers whose content is widely used to train AI systems have sought payment with varied levels of success. In 2023, Reddit and Stack Overflow announced plans to protect their data from AI crawlers while they sought licensing deals. Reddit was able to reach licensing agreements for Google, OpenAI, and others to use its content to train models. Stack Overflow saw traffic and question volume plummet, dropping from 200,000 questions per month in 2014 to 50,000 questions per month in late 2025. As its audience turned from discussing technical issues on the site to asking AI models for answers, the company pivoted from advertising as its primary revenue source to repackaging its data for AI training.

Why it matters: AI companies want to train their models on Wikipedia, and gathering data by sending API calls is much faster than crawling the web — never mind the rapid pace of crawling required to keep up with the encyclopedia’s never-ending revisions. At the same time, Wikipedia needs revenue to survive. Selling API access offers a helpful service to developers while giving this crucial data source a stronger financial foundation.

We’re thinking: These deals are win-win. People who choose to read the online encyclopedia the old-fashioned way can keep doing so, and people who build AI models can rest easier knowing they won’t kill a key source of training data.

Share

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox