Regex looks like an alien language at first glance, but thirty minutes of patient reading covers 80% of the text-validation, extraction, and replacement work you'll ever need. This guide moves from basics through groups and lookaround into the patterns you must never ship (catastrophic backtracking). Paste each example into Regex Tester as you read.
Literals vs metacharacters
Most characters match themselves. The pattern cat matches the string "cat". Twelve characters are special:
. ^ $ * + ? ( ) [ ] { } | \Escape them with \ to match the literal — e.g. \. for a literal period.
Character classes
Built-in
| Pattern | Meaning |
|---|---|
. | Any character except newline |
\d | Digit (0-9) |
\w | Word char ([A-Za-z0-9_]) |
\s | Whitespace (space, tab, newline) |
\D \W \S | Negations of the above |
Custom classes
[aeiou] # one vowel
[a-z] # one lowercase letter
[A-Za-z0-9] # one alphanumeric
[^abc] # not a, b, or c (^ negates)Most metacharacters are literal inside brackets — except ] \ ^ -.
Quantifiers
| Pattern | Meaning |
|---|---|
? | 0 or 1 |
* | 0 or more |
+ | 1 or more |
{n} | Exactly n |
{n,} | n or more |
{n,m} | n to m |
\d{3,4} matches 3 or 4 digits. https?:// matches "http://" or "https://".
Greedy vs lazy
Quantifiers are greedy by default — they match as much as possible. Add ? for lazy — as little as possible.
<.+> # greedy: matches <b>text</b> entirely
<.+?> # lazy: matches just <b>For HTML-tag-style "stop at the next closing character" patterns, lazy is almost always what you want.
Anchors
^— start of line$— end of line\b— word boundary (between a word char and a non-word char)
^[A-Z] matches lines starting with a capital letter. \bcat\b matches the word "cat" but not "category".
Groups and captures
Parentheses do two things: group a sub-pattern and capture the match.
(\d{4})-(\d{2})-(\d{2})Against "2026-05-16", group 1 = "2026", group 2 = "05", group 3 = "16". Access via match[1] in JS, m.group(1) in Python.
Named groups
(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})JS: match.groups.year. Python: m.group('year').
Non-capturing groups
(?:https?|ftp)://Groups without capturing. Keeps the result-array indices clean.
Alternation
cat|dog|birdOne of the three. Usually combined with a group: (cat|dog|bird).
Lookaround
Checks "X must be followed/preceded by Y" without including Y in the match.
| Pattern | Meaning |
|---|---|
X(?=Y) | X if followed by Y (lookahead) |
X(?!Y) | X if not followed by Y (negative lookahead) |
(?<=Y)X | X if preceded by Y (lookbehind) |
(?<!Y)X | X if not preceded by Y (negative lookbehind) |
\d+(?= USD) matches "100" in "100 USD" but the unit "USD" itself stays out of the match.
Flags
i— case-insensitiveg— global (JavaScript)m— multiline (^/$match each line)s— dotAll (.matches newline)u— Unicode (JavaScript)
Real examples
Pragmatic email
^[^\s@]+@[^\s@]+\.[^\s@]+$Perfect RFC 5322 takes dozens of lines, but this covers 95% of real validation. The only true validation is "send a verification email and wait for the click".
URL
https?://[^\s/$.?#].[^\s]*Hangul (Korean syllables)
[가-힣]+With the Unicode flag, \p{Hangul} is more correct.
Slug validator
^[a-z0-9]+(-[a-z0-9]+)*$kebab-case slug — matches what Slug Generator produces.
Catastrophic backtracking — patterns to avoid
When a match fails, the regex engine backtracks to try alternatives. Some patterns explode exponentially in input length, turning a 100-character string into minutes of CPU — known as ReDoS (Regex Denial of Service). Real production incidents have brought down Cloudflare, Stack Overflow, and others.
Dangerous pattern 1: nested quantifiers
(a+)+$Against "aaaaaaaaaaaaaaaaaaaaaaaa!" (matches fail), the engine tries every way to split the a's between the inner and outer + — exponential in length.
Dangerous pattern 2: alternation + greedy
(a|a)+$Same character in both alternatives. Each position now has two choices, multiplying at every step.
Mitigations
- No nested quantifiers.
(a+)+rewrites toa+. - Atomic groups / possessive quantifiers — PCRE and Java support
(?>a+)+ora++to disable backtracking. JavaScript does not. - Cap input length. For user-supplied patterns, enforce a max length (e.g. 10 KB).
- Match timeouts. Java needs a thread + interrupt pattern; Go's RE2 is immune by design (linear time, but no backreferences / lookbehind).
- Test before shipping. Run patterns through a ReDoS checker or measure runtime in Regex Tester.
Try it
- Regex Tester — pattern + flags + input, real-time match and group capture.
- String Case Converter — common case transforms without writing regex.
- Slug Generator — kebab-case slugs from mixed input.
Recap
- Memorize the twelve metacharacters; everything else is literal.
- Quantifiers are greedy by default. Add
?for lazy when you want "up to the next closer". - Groups serve two purposes: capture and grouping. Use
(?:…)when you don't need the capture. - Lookaround checks context without consuming characters — great for splitting and substitution.
- Never ship nested quantifiers like
(a+)+— that's ReDoS waiting to happen.