How CDNs Actually Work

A user in Korea requesting data from a US origin server — direct round-trip is 150ms+. With a CDN edge cache in a Korean IDC → 5ms. That's a CDN. But how does the edge know which POP is nearest? What happens on a cache miss? Purges? DDoS defense? This guide covers the CDN's internal behavior.

Edge POPs + BGP Anycast — Auto-Routing to the Nearest

Cloudflare's 300+ POPs (Points of Presence):
- Tokyo, Seoul, Singapore, Sydney, LA, London, ...

The CDN's IP (e.g. 1.1.1.1) is announced via BGP anycast:
- Every POP advertises the same IP
- ISP BGP routers pick the "shortest path"
- Korean users → Seoul POP, Japanese users → Tokyo POP

→ Not DNS magic — routing-layer magic. Same URL/IP reaches the nearest POP.

Compared with:
  GeoDNS (older CloudFront model): DNS returns different IPs per user IP
                                  → DNS resolver caching makes it imprecise
  Anycast (modern): the IP itself is routed → always precise

Cache Miss Flow

User (Korea)
  ↓ GET /image.jpg
Seoul POP
  ↓ cache miss
Origin shield (Tokyo regional cache)
  ↓ cache miss
Origin (US East data center)
  ← image data
Origin shield (Tokyo)
  ← cache + relay
Seoul POP
  ← cache + relay
User
  ← image data

On subsequent requests:
User → Seoul POP (cache hit, 5ms) → response.

Intent of the 3-layer hierarchy:
- Hits at Seoul POP are fastest
- On miss, Tokyo regional buffers origin load (every POP doesn't hit origin)
- "Origin shield" reduces origin traffic by ~1000×

Cache-Control Headers — Drive the CDN

Cache-Control: public, max-age=60, s-maxage=3600, stale-while-revalidate=86400

Meaning:
- public: cacheable everywhere (CDN + browser)
- max-age=60: browser caches 60s
- s-maxage=3600: CDN caches 1 hour (s = shared)
- stale-while-revalidate=86400: after expiry, serve stale up to 86400s
                                with background refresh

→ CDN caches 1 hour, browser caches 1 minute.
→ Users don't see origin updates immediately — up to 1 hour stale possible.

Other directives:
- no-store: never cache
- private: browser cache only, no CDN
- must-revalidate: must check origin after stale
- immutable: never changes during cache lifetime (hash-named assets)

Purge — Immediate Invalidation

Instead of waiting for TTL:

URL-based:
  CDN API: POST /purge {url: "https://example.com/page-A"}
  → Broadcast to all POPs (10-30s)

Tag-based (Cloudflare, Fastly):
  Origin response header: Cache-Tag: product-42, category-shoes
  Later, on product 42 update:
    POST /purge {tag: "product-42"}
  → Invalidates every cache with that tag
  → No need to know individual URLs

Full purge:
  POST /purge {everything: true}
  → Rarely used (massive origin load). For incident recovery only.

CDN = DDoS Defense

Origin capacity: 1000 req/s
Attacker: 100,000 req/s DDoS

Direct to origin → down.

With CDN in front:
1. CDN's 300 POPs spread traffic (1000 req/s per POP × 300 = 300K)
2. CDN's anti-DDoS (rate limiting, IP reputation, bot detection)
3. Cache-hit traffic never reaches origin (most attacks hit cacheable URLs)
4. Anomaly detection → challenge (CAPTCHA)

→ CDN providers (Cloudflare, Akamai, ...) are effectively also "DDoS defense services".
  That's the CDN's second value (the first being latency).

Related Tools

HTTP Status Codes — cache-related status codes (304 Not Modified etc.)

Common Pitfalls

No Cache-Control — default is "no cache". Set explicitly on every static asset.
Cookies / Authorization preventing caching — tune with Vary headers or move to a separate path.
Not using immutable for assets — webpack hash-named .js / .css → immutable + max-age=1 year.
Long TTL on HTML — HTML should be short (1-5 min) or stale-while-revalidate. Otherwise updates lag.
Caching responses with cookies — one user's session cookie reaches another. Verify safe defaults of your CDN.

Wrap-up

CDN — edge caches + anycast routing. Big latency cuts + lower origin load + DDoS defense. Practically mandatory in modern web.

Practical — don't rely on CDN provider defaults blindly: set Cache-Control explicitly + use Cache-Tag for fine-grained purges + monitor cache hit rate. Aim for 90%+.