Amazon, Meta, Microsoft, Mistral AI, and Perplexity have recently joined Wikimedia as “Wikimedia Enterprise partners”, intending to expand the set of companies paying for “reliable, high-throughput API access” to Wikipedia and other Wikimedia project data.
Wikimedia is the nonprofit Wikimedia Foundation that operates Wikipedia and other Wikimedia projects. Wikimedia Enterprise is the foundation’s paid offering for high-volume commercial reuse, providing reliable, high-throughput access to Wikimedia content via enterprise APIs and data feeds.
Being a Wikimedia Enterprise partner means using that contract-based access (instead of ad hoc scraping) to integrate Wikimedia data into products at scale.
Legal backdrop for AI data
The expansion comes as courts and regulators continue to weigh how copyrighted works can be used in AI training and related products. In the U.S., The New York Times alleges OpenAI and Microsoft copied and used Times content without permission in ways tied to model training and output, according to its complaint filed in federal court.
Authors have also sued Meta over alleged use of copyrighted books to train its Llama models, as detailed in the Kadrey v. Metacomplaints and subsequent amended pleadings. The U.S. Copyright Office has separately described the stakes and policy debate around licensing versus exceptions for AI training in its 2025 report on generative AI training.
Shift to contracted access
In parallel, some platforms have moved toward structured, contracted access for high-volume AI use. Reddit and Google, for example, said in February 2024 they expanded their partnership to give Google access to Reddit’s Data API for “structured” access and the ability to “train on” Reddit content, an example of the broader shift from ad hoc scraping toward permissioned pipelines.
Product and delivery modes
Wikimedia Enterprise said it offers multiple delivery modes designed for high-volume reuse, including an on-demand API for the latest version of an article, a snapshot API with hourly-updated downloadable dumps by language, and a real-time API that streams edits as they happen. The company positioned the commercial feed as a way to integrate “human-governed knowledge” into products at scale.
Infrastructure pressure and what’s next
The business push lands as Wikimedia reports mounting infrastructure pressure from automated access patterns. In an April 2025 operations post, the foundation said multimedia bandwidth demand has risen 50% since January 2024, and that the increase is “not coming from human readers” but “largely” from automated scraping of Wikimedia Commons content.
The same post said bot behavior is disproportionately costly for Wikimedia’s core data centers and that at least 65% of its most resource-consuming traffic comes from bots, even though bots represent a smaller share of overall pageviews.
Separately, Wikimedia has also flagged softening human usage. In a January 2026 update on reader behavior, the foundation said overall Wikipedia pageviews are down ~8% year-over-year, and it pointed to shifts such as readers getting answers via search, social platforms, and AI tools.
The announcement adds to evidence that more data access is shifting toward contracted, governed pipelines. Wikimedia’s messaging emphasizes operations: controlled feeds could reduce uncontrolled crawler load while providing predictable freshness and provenance for downstream use cases like search, assistants, and retrieval-augmented generation (RAG).
This aligns with broader governance guidance: Deloitte has argued that AI output quality and trustworthiness depend on data quality, and that organizations need stronger governance for unstructured data (text, images, code) to manage reliability and ethical use.
Wikimedia has not disclosed pricing or commercial terms in its public Enterprise partner announcement. Wikimedia has not said whether paid access will reduce bot demand; that remains an open question.