Cloudflare’s network suffered a major outage on November 18, 2025, triggering waves of HTTP errors across websites worldwide and underscoring the fragility of the digital services that underpin much of the global economy.

The incident — Cloudflare’s worst since 2019 — was traced not to a cyberattack but to an internal configuration error deep within the company’s bot-mitigation infrastructure.

According to Cloudflare, a routine change in database permissions caused the system to generate duplicate entries in the feature file used by its Bot Management service, a core tool designed to detect and block malicious automated traffic.

The duplicated data swelled the feature file well beyond its intended limits, destabilizing the proxy service and prompting widespread failures across its network. The company restored the majority of traffic by 9:30 am EST, with full recovery by late morning.

The consequences were hard to miss. Cloudflare carries about one-fifth of global internet traffic, making the disruption felt across a wide swathe of the online economy.

OpenAI reported a “full outage” affecting ChatGPT and related APIs. Canva, Grindr and Dropbox’s DocSend service all logged issues tied to Cloudflare’s infrastructure, while customers found themselves unable to authenticate to the company’s own dashboard.

Impacts touched its core security and CDN services, as well as Turnstile, Workers KV and Access.

Cloudflare’s CTO, Dane Knecht, publicly apologized, acknowledging in a social-media post that “a latent bug in a service underpinning our bot mitigation capability started to crash after a routine configuration change… [cascading] into a broad degradation of our network and other services.”

In the day since the incident, the company has committed to strengthening configuration-file handling, introducing global kill-switches and reviewing failure modes across its proxy modules.

Industry experts: This is a sign of a larger problem

Industry observers say the incident highlights a broader structural risk, echoing concerns raised only weeks earlier during the AWS outage.

Brent Ellis, principal analyst at Forrester, says the event “shows the impact of concentration risk,” estimating that the three-hour disruption could result in direct and indirect losses of $250 million to $300 million.

“Resilience isn’t free,” he says. “Businesses will need to decide if they want to invest in alternative providers and failover solutions.”

Christina Kosmowski, CEO of observability firm LogicMonitor, says the outage illustrates a harsh reality: most organizations depend on an invisible stack of third-party services they neither manage nor fully understand.

“This isn’t about blame. It’s about truth,” she says. “Modern businesses depend on layers of technology they don’t control — and when one link fails, the impact ripples fast.”

Observability, she argues, “isn’t a nice-to-have anymore. It’s the control room for resilience. Uptime isn’t a metric — it’s reputation, revenue and trust.”

Benjamin Schilz, CEO of secure-collaboration platform Wire, adds that the digital ecosystem’s dependence on a handful of hyperscalers creates “a massive single point of failure.”

While large-scale outages are not new, he warns that the concentration of critical infrastructure among predominantly US-based providers has left governments and enterprises overly exposed.

“Resilience, diversity and redundancy must always be weighed against convenience,” he says. “True resilience isn’t just about redundancy; it’s about maintaining control over your own data.”

For sectors such as payments, the systemic risk is even more serious: “The infrastructure behind a single transaction relies on a chain of cloud platforms, processors, APIs and authentication tools,” says Fadl Mnatash, CISO at Tribe Payments.

“When any link fails, the entire journey can break.” He says organizations need a “prepper mindset,” rehearsing failure scenarios and isolating faults before they cascade.

Failures can accelerate regulator response

Regulators in the UK and EU are actively strengthening operational-resilience requirements, and some expect this incident to accelerate scrutiny of cloud-dependency risks.

Mayur Upadhyaya, CEO of APIContext, says enterprises must move beyond internal monitoring and adopt “outside-in testing” to baseline normal behavior across their digital supply chains.

“In an internet increasingly run by APIs and automation, resilience isn’t just about uptime,” he says. “It’s about knowing when critical services are degrading before users — or machines — feel the impact.”

Personalized Feed
Personalized Feed