Skip to content
yutils

How Garbage Collection Actually Works

Mark-and-sweep, reference counting, generational hypothesis, why 99% of objects die young, the stop-the-world pause, G1 vs ZGC vs Shenandoah, and why GC tuning is really just allocation pattern tuning.

~9 min read

Java, Go, JavaScript, Python — all reclaim memory via GC. But if you treat GC as "magic," you're powerless when GC pauses cause production latency spikes. This guide covers the actual mechanics of mark-and-sweep, why the generational hypothesis works, the meaning of the "99% die young" statistic, and the trade-offs between modern GCs (G1, ZGC, Shenandoah).

The Problem GC Solves — Reachability

function foo() {
  let a = {x: 1};   // alloc
  let b = {y: 2};   // alloc
  a.next = b;       // a → b reference
  return a;         // only a returned
}
const root = foo();
// root → a → b (all reachable)

root.next = null;  // break a's b reference
// now b is unreachable → GC will reclaim it

The core question of GC: which objects are reachable? Reachable = can be traversed from a root (stack variables, globals, registers) through reference chains. The rest is garbage.

Algorithm 1 — Reference Counting (Python, Swift, ObjC ARC)

# Python (CPython)
a = [1, 2, 3]   # refcount = 1
b = a           # refcount = 2
b = None        # refcount = 1
a = None        # refcount = 0 → freed immediately
  • Pros: instant reclaim (deterministic). No pauses.
  • Con 1: increment/decrement on every assignment.
  • Con 2: can't catch cyclic references. If a → b → a, both have refcount ≥ 1 forever.

CPython uses refcount + a separate cycle detector. Swift makes cycle avoidance the programmer's responsibility via weak references. Simplicity is the appeal.

Algorithm 2 — Mark-and-Sweep

Phase 1 — Mark (reachability tracing):
  worklist = [root1, root2, ...]
  while worklist not empty:
    obj = worklist.pop()
    if not obj.marked:
      obj.marked = true
      for child in obj.references:
        worklist.push(child)

Phase 2 — Sweep:
  for obj in heap:
    if not obj.marked:
      free(obj)
    else:
      obj.marked = false  // reset for next cycle
  • Handles cycles correctly.
  • No increment/decrement overhead.
  • But walks the whole heap — pauses get long with large heaps.

The Generational Hypothesis — 99% Die Young

The empirical distribution in real programs:

Object lifetime distribution (typical server app):

Lifetime < 1 GC cycle  : ~95–99%  ← per-request temporaries
Lifetime ~few cycles   : ~1–4%
Lifetime > 100 cycles  : ~0.1% (caches, global state)

Most garbage is young. So instead of walking the whole heap every time, walk young often, old rarely.

Generational GC Layout

heap:
┌──────────────┬──────────────┬──────────────────┐
│ Young (Eden) │ Survivor 0/1 │     Old Gen      │
└──────────────┴──────────────┴──────────────────┘
       ↑              ↑              ↑
   new alloc      after minor GC   after N survives
   lands here     survivors here   promote to old

Minor GC (frequent, fast):
- walk Young only
- copy survivors → Survivor
- bulk-free the rest of Eden (cost ∝ survivors, dead cost 0!)

Major GC (rare, slow):
- walk whole heap (incl. Old)

Key trick — copying collector. Copy only survivors to another area, bulk-free the source. If few objects survive (95% die), this is extremely cheap.

Cross-generation Reference Problem

If an old-gen object references a young-gen one?
→ minor GC walking only young wouldn't know that young object is reachable

Solution: write barrier
- mark a "card" when old writes a young reference
- minor GC additionally scans cards as extra roots

Write barrier = a tiny extra instruction on every reference store. Costs something, but generational's throughput win is much larger.

Stop-the-World — All Threads Paused

Normal:    [T1 running] [T2 running] [T3 running]
GC start:  [T1 pause ] [T2 pause ] [T3 pause ]
GC:         (single-threaded mark + sweep)
GC end:    [T1 resume] [T2 resume] [T3 resume]

Impact: can't serve requests during pause → latency spike

Old Java GCs (Serial, Parallel) STW the entire collection. A Major GC on a 100 GB heap can pause for seconds — enemy of p99 latency and SLAs.

Modern GC — Shrinking the Pause

Concurrent — Mark Phase Runs Alongside the App

STW only for taking the mark snapshot and sweep. Most of the time the app runs concurrently with the collector.

Incremental — Break GC into Small Pieces

Split GC into small chunks and interleave with app execution. No single long pause.

Region-Based — Many Small Regions

G1, ZGC. Divide the heap into hundreds of regions and reclaim those with the most garbage first. Major GC can be partial.

Java GC Choices (Common)

GCPauseThroughputBest for
SerialLongMedium<100MB heap, single core
ParallelLongBestBatch jobs, throughput first
G1 (default in 9+)Med (10–200 ms)MediumGeneral server, 4–100 GB
ZGC<1ms most of the timeSlight hitLarge heaps (100GB+), latency SLAs
Shenandoah<10msSlight hitLatency SLAs, OpenJDK

Go's GC — Concurrent Tri-Color Mark-Sweep

Go's GC is not generational (escape analysis keeps things on stack when possible). Concurrent mark + write barriers + short STW (target 100 μs – few ms).

Tri-color invariant:
- white : not yet marked (garbage candidate)
- gray  : marked but children not yet visited
- black : marked + all children visited

During mark phase:
- when gray worklist empties, mark is done
- if app creates a black → white reference, invariant breaks
  → write barrier turns the white object gray (preserves mark)

V8 (JavaScript) GC

  • Generational (young = "new space" + survivors; old = "old space")
  • Young: Cheney's copying algorithm (Scavenge)
  • Old: Concurrent mark + sweep + compact (Major GC, incremental)

The Practical Truth About GC Tuning

Most GC tuning is really allocation pattern tuning, not GC option tweaking.

// Bad — thousands of new objects per request
public Response handle(Request req) {
    List<Item> items = new ArrayList<>();
    for (int i = 0; i < 10000; i++) {
        items.add(new Item(i));  // 10000 allocs
    }
    return build(items);
}  // all immediately garbage

// Good — reuse via pool
private final ItemPool pool = new ItemPool(10000);
public Response handle(Request req) {
    pool.reset();
    for (int i = 0; i < 10000; i++) {
        pool.acquire().set(i);  // 0 allocs (reused)
    }
    return build(pool);
}

Allocation Rate Drives GC Pause Frequency

Higher allocation rate → young gen fills faster → more minor GCs → higher cumulative GC pause.

Sudden Long-Lived Object Bursts Cause Promotion Storms

Survivor fills, objects get promoted to Old quickly → Old fills → Major GC → long pause.

Common Pitfalls

  • "GC handles it" illusion — GC is a large contributor to production latency spikes. You won't know without profiling.
  • Cranking heap size up indiscriminately — larger heap = longer GC pauses (pre-G1). ZGC is fine with it.
  • Excessive immutable-object churn (String etc.) — every modification allocates. Use StringBuilder / pools.
  • Finalizers / WeakReference overuse — extra GC work. Prefer explicit close.

Wrap-up

GC is not magic. It's mark-and-sweep + generational + concurrent, composed into modern collectors. Understanding the machinery lets you answer questions like "why does our p99 spike occasionally?"

Your allocation pattern drives GC behavior — read together with how-memory-allocation-works.

Back to guides