Byte-Sized Design

Byte-Sized Design

Cloudflare’s July 2025 Outage: The Global Outage Triggered by a Dormant File

The Day Cloudflare Withdrew Its Own IP

Byte-Sized Design's avatar
Byte-Sized Design
Jul 18, 2025
∙ Paid

On July 14, 2025, 1.1.1.1, one of the most widely used DNS resolvers, stopped working.

For 62 minutes, recursive DNS resolution failed for millions. Every query to 1.1.1.1 returned nothing. To many users, the entire internet broke.

The root cause wasn’t an attack, or a hijack, or a hardware failure. It was a dormant configuration error, committed 38 days earlier, sitting in a legacy system that no one expected to matter.


🌐 How Anycast Enables Fast Failure

Cloudflare uses anycast to route IPs like 1.1.1.1. The same address is advertised in dozens of locations. BGP routes your request to the closest data center.

The upside: low latency, high resilience. The downside: withdraw the route, and the service vanishes everywhere.

That’s what happened. When the anycast advertisement was dropped, 1.1.1.1 ceased to exist from the perspective of the internet.

This wasn’t a localized outage. There was no fallback. Global reach meant global failure.


🧨 The Dormant Error from June 6

On June 6, Cloudflare engineers configured a new pre-production service for the Data Localization Suite (DLS). It wasn’t live. It wasn’t used. But its configuration accidentally included 1.1.1.1’s prefixes.

Because nothing in production pointed to the new service, the change had no immediate effect. No routes were withdrawn. No alerts fired. No tests failed.

The config error went unnoticed—sitting in production, waiting for someone to touch it.


🕒 The Trigger: A Non-Production Change in July

On July 14 at 21:48 UTC, Cloudflare updated the same DLS service by attaching a test location.

That change triggered a global refresh of route topologies.

Because the old config mistakenly assigned 1.1.1.1’s IPs to the test service, those IPs were now routed exclusively to a data center that didn’t exist. The rest of the network dropped them.

This refresh wasn’t supposed to touch 1.1.1.1 at all. But legacy systems lacked the guardrails to prevent overlap between active and inactive services. Ther was no deployment ring, no progressive rollout, no validation that excluded production addresses.

📉 Impact Timeline (UTC)

User's avatar

Continue reading this post for free, courtesy of Byte-Sized Design.

Or purchase a paid subscription.
© 2026 Byte-Sized Design · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture