How Flaky Tests Actually Work

The same test passes one run, fails the next. CI trust slowly erodes → "probably flaky again" → real bugs slip through. This guide covers the five root causes of flaky tests, why "just retry" is the wrong fix, and how to find and kill them systematically.

What Flaky Really Means — Non-Deterministic

Normal test:
  input → same output → always the same result (pass or fail)

Flaky test:
  input → different output → result varies

Cause: something non-deterministic in the test.
- Time (Date.now)
- Random (Math.random, UUID v4)
- Concurrency (race conditions)
- External state (network, file, env)
- Order dependency (shared state)

→ "the test fails randomly" is false — only the trigger is random;
  there's always a cause.

Five Root Causes

1. Race Conditions (most common)

// Doesn't wait for async work
test("clicks button", async () => {
  await user.click(button);
  expect(screen.getByText("Success")).toBeVisible();  // ← race
});

Problem: "Success" takes ms to render. Passes on fast machines, fails
         on slow CI / loaded runners.

Fixes:
- Use waitFor
  await waitFor(() => expect(screen.getByText("Success")).toBeVisible());
- findBy* (auto-waits)
  await screen.findByText("Success");
- Explicit sleep (last resort, anti-pattern)

2. Order Dependency

test("A creates user 1", () => {
  db.users.insert({id: 1, name: "A"});
});

test("B finds user 1", () => {
  const u = db.users.findById(1);  // ← needs A to have run first
  expect(u.name).toBe("A");
});

Problem:
- Parallel or random test order → B runs first → fails
- One test's setup leaks into another

Fixes:
- Each test owns setup + teardown
- DB reset in beforeEach
- Transaction-rollback pattern (rollback at test end)
- Test isolation (jest --isolatedModules)

3. Time-Dependent

test("token expires after 1 hour", () => {
  const token = createToken();
  // wait 1 hour... impossible
  // or hack token.createdAt directly (implementation coupling)
  expect(token.isValid()).toBe(false);  // how?
});

Problems:
- Real-time dependence = 1-hour test
- new Date() usage breaks at midnight / DST boundaries

Fixes:
- Mock time (jest.useFakeTimers, sinon.useFakeTimers)
  jest.setSystemTime(new Date("2026-05-25"));
  jest.advanceTimersByTime(60 * 60 * 1000 + 1);
  expect(token.isValid()).toBe(false);
- Inject time (clock as a function arg)
  createToken({now: () => "2026-05-25T00:00:00Z"});

4. Network / External Service

test("fetches from API", async () => {
  const data = await fetch("https://api.example.com/users").then(r => r.json());
  expect(data.length).toBeGreaterThan(0);
});

Problems:
- External API down or slow → fail
- Response data varies
- Rate-limited

Fixes:
- HTTP mock (nock, msw, MSW worker)
  - Real network for one integration test only, mock the rest
  - Contract tests separately verify the spec
- Explicit timeouts + explicit retries (app intent, not auto-retry)

5. Shared State / Pollution

// Module-level state
let cachedUser = null;
function getUser() { ... cachedUser = ...; }

test("A", () => { getUser(); /* pollutes cachedUser */ });
test("B", () => { /* assumes empty cache */ });  // ← fails if A ran first

Problems:
- Module-level vars, singletons, env vars, file system, ...
- One test's side effects influence another

Fixes:
- afterEach reset (clear cache, restore env, delete files)
- jest's resetModules (each test gets a fresh module instance)
- Single responsibility (each test depends only on its own setup)

Why Auto-Retry Is the Wrong Fix

// CI config
retries: 3   // flaky? retry 3 times, one pass = OK

→ Effectively "hiding" flaky tests.

Problem 1 — Erodes CI trust:
  "Failed? Just retry" becomes habit
  Real bugs are masked by retries
  → "this test fail is random" mental model
  → real fails ignored

Problem 2 — Root cause not fixed:
  Flaky root cause is often a real bug (race, leak, ...)
  Retries hide the bug; the same race hits real users in production

Problem 3 — Slow CI:
  More flaky tests → CI takes longer

→ Retry = temporary measure. Permanent use is anti-pattern.

Correct approach:
1. Quarantine the failing test (isolated group, doesn't affect main suite)
2. Find the root cause (next section)
3. Unquarantine after fix

Systematic Hunting

1. Run the Test 1000 Times

# Just the suspected test, 1000 times
jest path/to/test.spec.ts --testNamePattern "flaky one" \
  --runTestsByPath --maxWorkers=1

# 100 pass / 5 fail → 5% flaky
# Compare stack traces / logs at fail time

# Vitest also has --repeat
vitest run --repeat 100 path/to/test

2. Randomize Test Order

# jest --randomize, vitest's random seed
jest --randomize

→ Order dependencies show up as patterned failures.

Tools: jest-circus, vitest fileParallelism

3. Vary Worker Count

jest --maxWorkers=1   # serial
jest --maxWorkers=4   # 4 parallel workers

→ Failure pattern changing with maxWorkers = concurrency issue
→ Passes only with maxWorkers=1 = shared state with other tests

4. CI-Specific

Locally passes / CI fails →
- env var differences
- resource (CPU/memory) differences → timing
- network differences
- timezone (UTC vs local) differences

Reproduce CI env: docker with the same image, same env, --maxWorkers=4

Common Pitfalls

sleep to solve races — timing varies per machine, just creates new flakiness. Use waitFor / findBy*.
Ignore flaky + auto-retry — CI trust → 0 → real bugs missed → production incidents.
Shared DB — no test isolation. Use transaction rollback or schema-per-test.
Direct Math.random / Date.now usage — mock or inject.
No flakiness metrics — you don't know which test is how flaky. Dashboard the retry rate in CI.

Wrap-up

Flaky tests aren't "tests being random" but real bugs in code or tests. Retries are temporary; the real fix identifies which of the five causes and eliminates it.

Practical — dashboard CI retry rates, quarantine + open issues on flaky detection, allow at most 1 retry (silent recovery if it works, real fail if it still fails). Mock Math.random / Date.now / real network deliberately.