How did a single database permission change bring down a significant portion of the internet? On November 18, 2025, a seemingly minor alteration to database permissions in Cloudflare’s ClickHouse cluster triggered a cascading failure that affected roughly 20% of webpages worldwide.
The change caused queries to return duplicate rows, generating an oversized feature file that exceeded software limits when processed by Cloudflare’s Bot Management system.
The impact was immediate and far-reaching. Major services including X (Twitter), OpenAI, Anthropic, Spotify, Zoom, and Coinbase displayed Cloudflare error pages, leaving millions of users unable to access critical platforms.
Approximately one-third of the world’s top 10,000 websites experienced disruptions during the six-hour outage that began at 11:20 UTC.
Technical teams identified the problem through automated tests at 11:31 UTC. The faulty feature file had doubled in size, causing machine-learning models to malfunction and creating a domino effect across Cloudflare’s infrastructure.
Workers KV experienced degraded response rates, while authentication services failed for most users. Similarly, Cloudflare Access experienced 100% failure rates for identity-based logins across all application types during the outage.
The incident revealed a critical design limitation where the system preallocates memory for only 200 features, which was exceeded by the corrupted configuration file.
Recovery efforts proceeded methodically:
- Engineers halted generation of the problematic file
- A known-good version was injected into the system
- Key services were restarted to restore normal operations
- Traffic manipulation techniques were implemented to stabilize services
By 14:30 UTC, core traffic flows had returned to normal, though full recovery wasn’t achieved until 17:06 UTC.
The incident demonstrates the internet’s fragile interdependence, where a single component failure can ripple through global digital infrastructure.
This outage serves as a reminder of centralization risks in cloud services. When Cloudflare falters—a company that handles traffic for approximately 20% of the internet—the digital world trembles.
For organizations, this highlights the importance of redundancy planning and failover strategies that don’t rely on a single provider’s infrastructure, no matter how reliable they typically are.
Like EDI’s encrypted protocols, which protect electronic transactions between businesses, cloud service providers need robust security measures to prevent cascading failures of critical systems.