Skip to content
yutils

Why Regex Sometimes Hangs (ReDoS Explained)

Regex engines that backtrack can take exponential time on innocent-looking patterns like (a+)+. Learn the NFA/backtracking model, the ReDoS attack vector, and the linear-time engines (RE2) that prevent it.

~8 min read

This regex is scarier than it looks:

pattern: /^(a+)+$/
input:   "aaaaaaaaaaaaaaaaaaaaaaaaa!"  (25 a's then a !)

In JavaScript / Python / Java the match attempt takes:
  20 chars: 1 sec
  25 chars: 30 sec
  30 chars: 30 min
  35 chars: 18 hours (!)

Innocent-looking patterns can run in exponential time on carefully crafted input — a ReDoS (Regular Expression Denial of Service) attack. This guide explains backtracking, the "catastrophic" patterns, and what linear-time engines like RE2 do differently.

Why regex can be slow — backtracking

Most languages (JavaScript, Python, Perl, Java, .NET, Ruby) use backtracking NFA engines. In short:

  • The engine tries each possible matching path
  • On failure, it "backtracks" and tries another
  • A failed match has to exhaust every possibility first
pattern: /a+a+/
input:   "aaaab"

Engine tries:
1. greedy a+ matches "aaaa" → no a left for the second a+ → backtrack
2. a+ matches "aaa" → next a+ matches "a" → next char is "b" → fail → backtrack
3. a+ matches "aa" → next a+ matches "a" or "aa" → "b" doesn't match → backtrack
... continues

Simple in isolation; explosive when nested.

Catastrophic backtracking — the (a+)+ secret

pattern: /^(a+)+$/
input:   "aaaa!"

(a+)+ means:
  outer + : the group at least once
  inner a+: at least one a

This is ambiguous — how to partition "aaaa" into groups?
  [aaaa]                       1 group
  [aaa][a]                     2 groups
  [aa][aa]                     2 groups
  [aa][a][a]                   3 groups
  [a][aaa]                     2 groups
  [a][aa][a]                   3 groups
  [a][a][aa]                   3 groups
  [a][a][a][a]                 4 groups

Partition count for n a's = 2^(n-1)

The trailing ! never matches → backtrack through every partition → exponential time

That's catastrophic backtracking. Polynomial or exponential time on input length.

ReDoS — real incidents

Cloudflare, July 2019

A WAF rule contained a catastrophic regex. A specific input burned 100% CPU across Cloudflare edge servers — 27-minute worldwide outage. The root cause was a regex.

Stack Exchange, 2016

A moderation regex went catastrophic on a single user's post, taking the site down.

Pattern — a server-side regex applied to user input. Attackers send crafted input to cause polynomial or exponential matching.

Dangerous patterns to watch for

1. Nested quantifier — (a+)+ family

(a+)+
(a*)*
(a|aa)+      ← alternation + repetition
(a|a)*       ← duplicate alternatives

2. Overlapping alternation — (a|ab)+

(a|ab)+      ← "ab" matches both ways
                  → backtracking explosion

3. Overlapping repetition — a+a+

a+a+         ← where does the first a+ end?
                  → many ambiguous splits → backtracking

4. Greedy / lazy combinations

<.+>.*</.+>   ← can blow up on unbalanced inputs
<.+?>         ← lazy is usually safer than greedy here

Safer constructs — atomic groups and possessive quantifiers

Atomic group (?>...)

# Normal group, can backtrack
(a+)+        ← catastrophic

# Atomic — once the group matches, it never gives back
(?>a+)+      ← safe

Atomic groups commit to a match. Supported in Perl, .NET, PCRE, Python 3.11+, JavaScript 2025+.

Possessive quantifiers (a++, a*+)

a++          ← greedy + no backtracking
a*+

# normal vs possessive
a+a          ← matches "aa" (a+ gives back the last a)
a++a         ← fails (a++ takes everything and won't give back)

Java, PCRE, Perl support these. JavaScript doesn't (as of 2024).

Bound the input

// Simplest defense
if (userInput.length > 1000) throw new Error("too long");
const match = pattern.exec(userInput);

// Or timeout (some Node libraries support it)
const match = await pattern.execWithTimeout(userInput, 100);

Linear-time engines — RE2 and friends

Google's RE2 (Ken Thompson + Rob Pike) doesn't backtrack. NFA / DFA simulation runs in time linear in the input. ReDoS is impossible.

Trade-offs:

  • No backreferences (\1)
  • Limited lookbehind
  • POSIX-style features only

Where you'll see it:

  • Go's standard regexp — RE2-based
  • Rust's regex crate — RE2-inspired with extensions
  • Python's re2 binding (separate package)
  • JavaScript X — V8's engine is backtracking

The browser regex tester trap

Test a catastrophic pattern with a long input in any browser regex tester and the tab freezes. Regex Tester already debounces input, but very large inputs to a malicious pattern are still risky. On the server, always add a timeout.

ReDoS detection tools

  • safe-regex (Node, npm) — static analyzer for catastrophic patterns
  • recheck — Java/JavaScript regex ReDoS audit
  • regexploit — Python-based regex fuzzer

Wire one into CI to scan dependencies and your own regexes.

Examples — safe vs risky

# email — common ReDoS
^([a-zA-Z0-9]+)*@                         ← risky ((..+)* pattern)
^[a-zA-Z0-9]+@                            ← safe

# URL path
^(/[a-z]+)+$                              ← risky (nested +)
^/(?:[a-z]+/)*[a-z]+$                     ← safer (non-capturing, atomic-ish)

# JSON quoted string
"(.*?)"                                   ← typically safe (lazy)
"(\\.|[^"])*"                           ← potentially risky (alternation inside *)

# date
^(\d+)+-(\d+)+$                         ← risky (nested +)
^\d+-\d+$                                ← safe

Defense strategy

  1. Avoid nested quantifiers — review patterns for (a+)+ family before deploying.
  2. Bound input length — check length before regex on user input.
  3. Add timeouts — use libraries / middleware that abort long matches.
  4. Use a linear engine where possible — Go / Rust / RE2. Hard for JavaScript without server help.
  5. Run static analyzers in CI — safe-regex / recheck.
  6. Last resort — drop regex for string methods (indexOf, split) when feasible.

Common pitfalls

1. Server-side validation regex

Email / username / password patterns are the most common ReDoS surface. Audit the OWASP catalog.

2. Log parsing

nginx / Apache log parsers apply regex per line. One catastrophic line in a 1 MB log freezes the parser.

3. CSP / WAF rules

Regex in WAF or intrusion detection rules. The attacker takes down the security layer itself (Cloudflare 2019).

4. Markdown / wiki parsers

Nested markdown syntax via regex can blow up. Discord / Slack have shipped ReDoS fixes over the years.

5. JavaScript has no timeout

V8 doesn't expose a regex timeout. A catastrophic match blocks the main thread. setTimeout can't interrupt a sync regex.

Mitigations — run regex in a Worker thread, or offload to a Node service that uses RE2.

References

  • OWASP — ReDoS — OWASP
  • Cloudflare's July 2019 outage — postmortem
  • Russ Cox — Regular Expression Matching Can Be Simple And Fast — swtch.com
  • RE2 syntax — GitHub

Summary

  • Most languages run backtracking NFAs. Some patterns turn exponential on input length.
  • Catastrophic backtracking — (a+)+ family — happens because the engine tries every partition on a failed match.
  • ReDoS = crafted input to take a service down via a regex. Cloudflare 2019 is the famous example.
  • Watch for nested quantifiers, overlapping alternation, overlapping repetition, greedy / lazy abuse, unbounded repeats.
  • Defenses — atomic groups / possessive quantifiers / bounded input / timeouts / RE2.
  • RE2 (Go / Rust) guarantees linear time at the cost of features like backreferences.
  • JavaScript has no regex timeout — Worker threads or server- side offload work around it.
  • Try it carefully: Regex Tester can freeze your tab on a catastrophic pattern + long input.
Back to guides