QR codes were invented in 1994 by Denso Wave in Japan for tracking automotive parts. They're now everywhere — menus, payments, tickets, Wi-Fi handoff. What looks like a grid of random black squares is actually a precisely engineered structure where finder patterns, error correction, and mask selection work together so a camera can read it from a tilted angle or a soiled surface. This guide walks through the key building blocks based on the ISO/IEC 18004 specification.
The big picture — function patterns + data area
A QR code splits into two regions:
- Function patterns — fixed cells the scanner uses to locate, orient, and size the code. Finder / alignment / timing / format / version.
- Encoding region — the data plus Reed-Solomon error correction codewords. Every cell that isn't a function pattern.
Versions range from 1 to 40. Version N is (4N + 17) × (4N + 17) cells: V1 = 21×21, V40 = 177×177.
Finder patterns — the three big corner squares
███████
█ █
█ ███ █
█ ███ █
█ ███ █
█ █
███████7×7 nested squares in three corners (top-left, top-right, bottom-left). Black outer (7×7) → white middle (5×5) → black center (3×3) with a 1:1:3:1:1 ratio across any line through the center.
Image processing scans for that 1:1:3:1:1 ratio in any direction — which is how a scanner can find a QR code from a tilted phone. Three finder positions = enough to compute perspective transform. The empty fourth corner identifies the code's rotation (which way is up).
Separator + alignment + timing
- Separator — 1-cell white border around each finder. Marks where the finder ends.
- Alignment patterns — 5×5 mini-squares that appear from Version 2 onward. For large codes, three corner finders aren't enough to keep perspective accurate across the middle of the grid; alignment patterns refine the mapping. V40 has 46 of them.
- Timing patterns — alternating black/white horizontal and vertical lines between finders. Lets the scanner measure the cell size precisely even when print/photo introduces small distortions.
Format information — ECC level + mask
A 15-bit region around the finders. Encodes which ECC level (L/M/Q/H) and which mask pattern (0-7) were applied. Those 15 bits are themselves protected by BCH (15, 5) — even if part of the format region is damaged, it can be recovered.
Scanner sequence: find finders → decode format → identify mask → XOR the mask out of the data region to recover the original bits.
Version information — large-code metadata
From Version 7 upward, an 18-bit version block sits next to the top-right and bottom-left finders. BCH (18, 6) protected. Encodes the 34 possible version numbers (7-40).
Data encoding — four modes
QR uses one of four encoding modes per data segment. Bits per character differ, so the right mode squeezes more data into the same code:
- Numeric (0001) — digits only. 3 chars → 10 bits. Densest mode (V40 H fits 7,089 digits).
- Alphanumeric (0010) — digits + uppercase Latin + the nine specials
$%*+-./:and space. 2 chars → 11 bits. - Byte (0100) — arbitrary 8-bit bytes (UTF-8 etc.). 1 char → 8 bits. URLs typically use this.
- Kanji (1000) — Shift-JIS 2-byte chars. 1 char → 13 bits. More efficient than byte mode for Japanese.
A single code can switch modes mid-stream — encoding "abc123" as alphanumeric + numeric is shorter than byte-only. QR Code Generator picks the optimal split automatically.
Error correction — the Reed-Solomon trick
The headline feature of QR. Damaged codes still scan. Four ECC levels:
| Level | Recoverable | Use case |
|---|---|---|
| L (Low) | ~7% | Clean screens (apps, web) |
| M (Medium) | ~15% | General print |
| Q (Quartile) | ~25% | Industrial / outdoor |
| H (High) | ~30% | Logo overlay / rough environments |
Reed-Solomon adds extra ECC codewords next to the data codewords. Higher ECC level = more ECC codewords = larger code for the same payload.
The math runs over a GF(2^8) Galois Field. Treat the data as coefficients of a polynomial, then locate and correct errors by solving an error-locator polynomial. The intuition: space valid codewords far enough apart that even a damaged codeword decodes to the closest valid one.
The logo trick
Centering a logo on top of a QR code works because ECC treats the obscured cells as "errors." With level H (~30%) you can cover up to roughly the central 25% and still scan.
Masking — pick the best of 8 patterns
Raw encoded data can produce big black blobs or sequences that confuse a scanner (e.g. resembling a finder). The QR spec XORs the data area with one of 8 mask patterns to balance the visual distribution:
Mask 0: (row + col) % 2 == 0
Mask 1: row % 2 == 0
Mask 2: col % 3 == 0
Mask 3: (row + col) % 3 == 0
Mask 4: (row/2 + col/3) % 2 == 0
Mask 5: (row*col)%2 + (row*col)%3 == 0
Mask 6: ((row*col)%2 + (row*col)%3) % 2 == 0
Mask 7: ((row+col)%2 + (row*col)%3) % 2 == 0Each candidate mask is scored with four penalty rules (runs of 5+ same-color cells / 2×2 same-color blocks / finder-like sequences / black-vs-white balance). The encoder picks the lowest total penalty. Generators like QR Code Generator compute this for you.
End-to-end — encoding "Hello!"
- Mode selection — "Hello!" includes "!" so it uses byte mode (0100).
- Character count — V1-V9 byte mode uses an 8-bit count. "Hello!" = 6 chars →
00000110. - Data bits — each ASCII byte expanded to 8 bits:
01001000 01100101 01101100 01101100 01101111 00100001. - Terminator + padding — append
0000, pad to a byte boundary, then alternate11101100/00010001pad bytes until codewords are full. - Reed-Solomon ECC — append ECC codewords determined by the chosen level.
- Module placement — skip function-pattern cells and zigzag the codeword bits into the rest.
- Mask selection — evaluate 8 masks, apply the lowest-penalty one.
- Format info — encode the ECC level + chosen mask into the 15-bit format region (BCH protected).
Common pitfalls
1. URL too long
V40 H byte mode tops out at 1,273 characters. Long query strings push the version up; cells get tiny and scanners struggle. Shorten the URL first.
2. Poor contrast
Stick to dark-on-light. Reversed (light QR on dark) breaks many decoders that assume "dark = 1." Most apps don't auto-invert.
3. Small print + damage
A 6 mm QR on a business card is risky at L. Use M+ for print; H for outdoor signage where dirt and weather are a factor.
4. Missing quiet zone
The spec requires a 4-cell white margin around the code (the quiet zone). Designers trimming it tight is a top cause of "this QR doesn't scan." Always verify the final print has the margin.
5. Confusing micro QR with regular QR
Micro QR (M1-M4) is a separate spec — only one finder, smaller capacity. Many scanners don't read it. Stick with full QR for payments and general use.
References
- ISO/IEC 18004:2024 — QR Code specification
- Reed-Solomon error correction — Wikipedia
- Denso Wave (QR inventor) — History of QR Code
Summary
- QR = five function patterns (finder / alignment / timing / format / version) + a data area.
- Four encoding modes (numeric / alphanumeric / byte / kanji) — picked automatically based on the input.
- Reed-Solomon ECC at four levels (L 7% / M 15% / Q 25% / H 30%). Use H for logo overlay.
- One of 8 mask patterns chosen by lowest penalty to avoid finder-like artifacts.
- Print checks: 4-cell quiet zone, dark-on-light contrast, ECC matched to use case.
- Try it directly with QR Code Generator — switch ECC L/M/Q/H and compare.