The same test passes one run, fails the next. CI trust slowly erodes → "probably flaky again" → real bugs slip through. This guide covers the five root causes of flaky tests, why "just retry" is the wrong fix, and how to find and kill them systematically.
What Flaky Really Means — Non-Deterministic
Normal test:
input → same output → always the same result (pass or fail)
Flaky test:
input → different output → result varies
Cause: something non-deterministic in the test.
- Time (Date.now)
- Random (Math.random, UUID v4)
- Concurrency (race conditions)
- External state (network, file, env)
- Order dependency (shared state)
→ "the test fails randomly" is false — only the trigger is random;
there's always a cause.Five Root Causes
1. Race Conditions (most common)
// Doesn't wait for async work
test("clicks button", async () => {
await user.click(button);
expect(screen.getByText("Success")).toBeVisible(); // ← race
});
Problem: "Success" takes ms to render. Passes on fast machines, fails
on slow CI / loaded runners.
Fixes:
- Use waitFor
await waitFor(() => expect(screen.getByText("Success")).toBeVisible());
- findBy* (auto-waits)
await screen.findByText("Success");
- Explicit sleep (last resort, anti-pattern)2. Order Dependency
test("A creates user 1", () => {
db.users.insert({id: 1, name: "A"});
});
test("B finds user 1", () => {
const u = db.users.findById(1); // ← needs A to have run first
expect(u.name).toBe("A");
});
Problem:
- Parallel or random test order → B runs first → fails
- One test's setup leaks into another
Fixes:
- Each test owns setup + teardown
- DB reset in beforeEach
- Transaction-rollback pattern (rollback at test end)
- Test isolation (jest --isolatedModules)3. Time-Dependent
test("token expires after 1 hour", () => {
const token = createToken();
// wait 1 hour... impossible
// or hack token.createdAt directly (implementation coupling)
expect(token.isValid()).toBe(false); // how?
});
Problems:
- Real-time dependence = 1-hour test
- new Date() usage breaks at midnight / DST boundaries
Fixes:
- Mock time (jest.useFakeTimers, sinon.useFakeTimers)
jest.setSystemTime(new Date("2026-05-25"));
jest.advanceTimersByTime(60 * 60 * 1000 + 1);
expect(token.isValid()).toBe(false);
- Inject time (clock as a function arg)
createToken({now: () => "2026-05-25T00:00:00Z"});4. Network / External Service
test("fetches from API", async () => {
const data = await fetch("https://api.example.com/users").then(r => r.json());
expect(data.length).toBeGreaterThan(0);
});
Problems:
- External API down or slow → fail
- Response data varies
- Rate-limited
Fixes:
- HTTP mock (nock, msw, MSW worker)
- Real network for one integration test only, mock the rest
- Contract tests separately verify the spec
- Explicit timeouts + explicit retries (app intent, not auto-retry)5. Shared State / Pollution
// Module-level state
let cachedUser = null;
function getUser() { ... cachedUser = ...; }
test("A", () => { getUser(); /* pollutes cachedUser */ });
test("B", () => { /* assumes empty cache */ }); // ← fails if A ran first
Problems:
- Module-level vars, singletons, env vars, file system, ...
- One test's side effects influence another
Fixes:
- afterEach reset (clear cache, restore env, delete files)
- jest's resetModules (each test gets a fresh module instance)
- Single responsibility (each test depends only on its own setup)Why Auto-Retry Is the Wrong Fix
// CI config
retries: 3 // flaky? retry 3 times, one pass = OK
→ Effectively "hiding" flaky tests.
Problem 1 — Erodes CI trust:
"Failed? Just retry" becomes habit
Real bugs are masked by retries
→ "this test fail is random" mental model
→ real fails ignored
Problem 2 — Root cause not fixed:
Flaky root cause is often a real bug (race, leak, ...)
Retries hide the bug; the same race hits real users in production
Problem 3 — Slow CI:
More flaky tests → CI takes longer
→ Retry = temporary measure. Permanent use is anti-pattern.
Correct approach:
1. Quarantine the failing test (isolated group, doesn't affect main suite)
2. Find the root cause (next section)
3. Unquarantine after fixSystematic Hunting
1. Run the Test 1000 Times
# Just the suspected test, 1000 times
jest path/to/test.spec.ts --testNamePattern "flaky one" \
--runTestsByPath --maxWorkers=1
# 100 pass / 5 fail → 5% flaky
# Compare stack traces / logs at fail time
# Vitest also has --repeat
vitest run --repeat 100 path/to/test2. Randomize Test Order
# jest --randomize, vitest's random seed
jest --randomize
→ Order dependencies show up as patterned failures.
Tools: jest-circus, vitest fileParallelism3. Vary Worker Count
jest --maxWorkers=1 # serial
jest --maxWorkers=4 # 4 parallel workers
→ Failure pattern changing with maxWorkers = concurrency issue
→ Passes only with maxWorkers=1 = shared state with other tests4. CI-Specific
Locally passes / CI fails →
- env var differences
- resource (CPU/memory) differences → timing
- network differences
- timezone (UTC vs local) differences
Reproduce CI env: docker with the same image, same env, --maxWorkers=4Common Pitfalls
- sleep to solve races — timing varies per machine, just creates new flakiness. Use waitFor / findBy*.
- Ignore flaky + auto-retry — CI trust → 0 → real bugs missed → production incidents.
- Shared DB — no test isolation. Use transaction rollback or schema-per-test.
- Direct Math.random / Date.now usage — mock or inject.
- No flakiness metrics — you don't know which test is how flaky. Dashboard the retry rate in CI.
Wrap-up
Flaky tests aren't "tests being random" but real bugs in code or tests. Retries are temporary; the real fix identifies which of the five causes and eliminates it.
Practical — dashboard CI retry rates, quarantine + open issues on flaky detection, allow at most 1 retry (silent recovery if it works, real fail if it still fails). Mock Math.random / Date.now / real network deliberately.