OpenAI and Cerebras Systems have announced a multi-year partnership to add 750 megawatts of “ultra low-latency” AI compute to OpenAI’s platform, with the capacity coming online in multiple tranches through 2028 and integrated into OpenAI’s inference stack “in phases.”

Cerebras said the rollout begins in 2026 and described the deployment as a large, staged buildout of wafer-scale systems optimized for high-speed inference, the production step where models generate responses for users.

Reported deal value

Neither OpenAI nor Cerebras disclosed commercial terms, but The Wall Street Journal first reported that the agreement is valued at around $10 billion, and Reuters subsequently reported it is worth more than $10 billion, citing its own source familiar with the matter.

OpenAI’s own announcement focused on product outcomes, faster responses for complex queries, code generation, image creation, and agentic workflows, and described the partnership as part of a “resilient portfolio” approach that matches systems to workloads.

Cerebras echoed that positioning, arguing that “real-time inference” can unlock new interaction patterns, and that speed is a driver of adoption for AI applications. As OEMs such as Lenovo push inferencing-first servers for enterprises, OpenAI is also signing multi-year inference capacity deals to keep latency down at platform scale.

Why inference capacity matters

For enterprise buyers, a key strategic takeaway is where AI infrastructure demand is heading. McKinsey said in December 2025 that inference is on track to surpass training as the dominant AI data-center workload by 2030, exceeding half of AI compute, and accounting for roughly 30–40% of total data-center demand, a shift that changes hyperscaler site strategy, networking, and power provisioning.

The International Energy Agency similarly projects global data-center electricity consumption doubles to ~945 TWh by 2030 in its base case, with rapid AI growth a major driver. In that context, OpenAI’s emphasis on low-latency inference capacity comes as the market moves towards once models are trained, the sustained, always-on serving layer can become a scaling bottleneck for deployment.

Cerebras economics and buyer due diligence

The Cerebras partnership also lands against a real financial backdrop for specialized chip providers. In its SEC filing, Cerebras disclosed that G42 accounted for 87% of its total revenue for the six months ended June 30, 2024 (and 83% for the year ended December 31, 2023), highlighting the customer-concentration risk that large, multi-year anchor contracts can help address.

All of this expands the menu of viable inference options and the due diligence burden that comes with it. OpenAI described a ‘mix of compute solutions’ and phased integration across workloads, underscoring a heterogeneous infrastructure approach.