This regex is scarier than it looks:
pattern: /^(a+)+$/
input: "aaaaaaaaaaaaaaaaaaaaaaaaa!" (25 a's then a !)
In JavaScript / Python / Java the match attempt takes:
20 chars: 1 sec
25 chars: 30 sec
30 chars: 30 min
35 chars: 18 hours (!)Innocent-looking patterns can run in exponential time on carefully crafted input — a ReDoS (Regular Expression Denial of Service) attack. This guide explains backtracking, the "catastrophic" patterns, and what linear-time engines like RE2 do differently.
Why regex can be slow — backtracking
Most languages (JavaScript, Python, Perl, Java, .NET, Ruby) use backtracking NFA engines. In short:
- The engine tries each possible matching path
- On failure, it "backtracks" and tries another
- A failed match has to exhaust every possibility first
pattern: /a+a+/
input: "aaaab"
Engine tries:
1. greedy a+ matches "aaaa" → no a left for the second a+ → backtrack
2. a+ matches "aaa" → next a+ matches "a" → next char is "b" → fail → backtrack
3. a+ matches "aa" → next a+ matches "a" or "aa" → "b" doesn't match → backtrack
... continuesSimple in isolation; explosive when nested.
Catastrophic backtracking — the (a+)+ secret
pattern: /^(a+)+$/
input: "aaaa!"
(a+)+ means:
outer + : the group at least once
inner a+: at least one a
This is ambiguous — how to partition "aaaa" into groups?
[aaaa] 1 group
[aaa][a] 2 groups
[aa][aa] 2 groups
[aa][a][a] 3 groups
[a][aaa] 2 groups
[a][aa][a] 3 groups
[a][a][aa] 3 groups
[a][a][a][a] 4 groups
Partition count for n a's = 2^(n-1)
The trailing ! never matches → backtrack through every partition → exponential timeThat's catastrophic backtracking. Polynomial or exponential time on input length.
ReDoS — real incidents
Cloudflare, July 2019
A WAF rule contained a catastrophic regex. A specific input burned 100% CPU across Cloudflare edge servers — 27-minute worldwide outage. The root cause was a regex.
Stack Exchange, 2016
A moderation regex went catastrophic on a single user's post, taking the site down.
Pattern — a server-side regex applied to user input. Attackers send crafted input to cause polynomial or exponential matching.
Dangerous patterns to watch for
1. Nested quantifier — (a+)+ family
(a+)+
(a*)*
(a|aa)+ ← alternation + repetition
(a|a)* ← duplicate alternatives2. Overlapping alternation — (a|ab)+
(a|ab)+ ← "ab" matches both ways
→ backtracking explosion3. Overlapping repetition — a+a+
a+a+ ← where does the first a+ end?
→ many ambiguous splits → backtracking4. Greedy / lazy combinations
<.+>.*</.+> ← can blow up on unbalanced inputs
<.+?> ← lazy is usually safer than greedy hereSafer constructs — atomic groups and possessive quantifiers
Atomic group (?>...)
# Normal group, can backtrack
(a+)+ ← catastrophic
# Atomic — once the group matches, it never gives back
(?>a+)+ ← safeAtomic groups commit to a match. Supported in Perl, .NET, PCRE, Python 3.11+, JavaScript 2025+.
Possessive quantifiers (a++, a*+)
a++ ← greedy + no backtracking
a*+
# normal vs possessive
a+a ← matches "aa" (a+ gives back the last a)
a++a ← fails (a++ takes everything and won't give back)Java, PCRE, Perl support these. JavaScript doesn't (as of 2024).
Bound the input
// Simplest defense
if (userInput.length > 1000) throw new Error("too long");
const match = pattern.exec(userInput);
// Or timeout (some Node libraries support it)
const match = await pattern.execWithTimeout(userInput, 100);Linear-time engines — RE2 and friends
Google's RE2 (Ken Thompson + Rob Pike) doesn't backtrack. NFA / DFA simulation runs in time linear in the input. ReDoS is impossible.
Trade-offs:
- No backreferences (
\1) - Limited lookbehind
- POSIX-style features only
Where you'll see it:
- Go's standard
regexp— RE2-based - Rust's
regexcrate — RE2-inspired with extensions - Python's
re2binding (separate package) - JavaScript X — V8's engine is backtracking
The browser regex tester trap
Test a catastrophic pattern with a long input in any browser regex tester and the tab freezes. Regex Tester already debounces input, but very large inputs to a malicious pattern are still risky. On the server, always add a timeout.
ReDoS detection tools
- safe-regex (Node, npm) — static analyzer for catastrophic patterns
- recheck — Java/JavaScript regex ReDoS audit
- regexploit — Python-based regex fuzzer
Wire one into CI to scan dependencies and your own regexes.
Examples — safe vs risky
# email — common ReDoS
^([a-zA-Z0-9]+)*@ ← risky ((..+)* pattern)
^[a-zA-Z0-9]+@ ← safe
# URL path
^(/[a-z]+)+$ ← risky (nested +)
^/(?:[a-z]+/)*[a-z]+$ ← safer (non-capturing, atomic-ish)
# JSON quoted string
"(.*?)" ← typically safe (lazy)
"(\\.|[^"])*" ← potentially risky (alternation inside *)
# date
^(\d+)+-(\d+)+$ ← risky (nested +)
^\d+-\d+$ ← safeDefense strategy
- Avoid nested quantifiers — review patterns for (a+)+ family before deploying.
- Bound input length — check
lengthbefore regex on user input. - Add timeouts — use libraries / middleware that abort long matches.
- Use a linear engine where possible — Go / Rust / RE2. Hard for JavaScript without server help.
- Run static analyzers in CI — safe-regex / recheck.
- Last resort — drop regex for string methods (
indexOf,split) when feasible.
Common pitfalls
1. Server-side validation regex
Email / username / password patterns are the most common ReDoS surface. Audit the OWASP catalog.
2. Log parsing
nginx / Apache log parsers apply regex per line. One catastrophic line in a 1 MB log freezes the parser.
3. CSP / WAF rules
Regex in WAF or intrusion detection rules. The attacker takes down the security layer itself (Cloudflare 2019).
4. Markdown / wiki parsers
Nested markdown syntax via regex can blow up. Discord / Slack have shipped ReDoS fixes over the years.
5. JavaScript has no timeout
V8 doesn't expose a regex timeout. A catastrophic match blocks the main thread. setTimeout can't interrupt a sync regex.
Mitigations — run regex in a Worker thread, or offload to a Node service that uses RE2.
References
- OWASP — ReDoS — OWASP
- Cloudflare's July 2019 outage — postmortem
- Russ Cox — Regular Expression Matching Can Be Simple And Fast — swtch.com
- RE2 syntax — GitHub
Summary
- Most languages run backtracking NFAs. Some patterns turn exponential on input length.
- Catastrophic backtracking — (a+)+ family — happens because the engine tries every partition on a failed match.
- ReDoS = crafted input to take a service down via a regex. Cloudflare 2019 is the famous example.
- Watch for nested quantifiers, overlapping alternation, overlapping repetition, greedy / lazy abuse, unbounded repeats.
- Defenses — atomic groups / possessive quantifiers / bounded input / timeouts / RE2.
- RE2 (Go / Rust) guarantees linear time at the cost of features like backreferences.
- JavaScript has no regex timeout — Worker threads or server- side offload work around it.
- Try it carefully: Regex Tester can freeze your tab on a catastrophic pattern + long input.