Base64 is a way to represent binary data as 64-character ASCII text. Email attachments, JWT headers and payloads, data URIs, certificates stuffed into environment variables — almost any time you need to squeeze binary through a text-only channel, you'll see Base64. The name suggests cryptography, but it is not encryption at all. This guide covers how it works, its variants, the size overhead, and the pitfalls that catch most teams.
Why we need encoding
Many protocols and formats only safely handle text. SMTP carries decades of 7-bit ASCII legacy, JSON cannot embed arbitrary bytes directly. To send an image or a key over these channels, you need to turn the bytes into text.
Base64 is standardized in RFC 4648. It takes three 8-bit bytes (24 bits), chops them into four 6-bit groups, and maps each 6-bit group to one of 64 characters — hence the name. Three input bytes become four output characters, a 4/3× expansion (33% overhead).
Alphabet and padding
The standard alphabet is these 64 characters:
A-Z a-z 0-9 + /When the input length isn't a multiple of three, the output is padded with = to reach a multiple of four characters.
| Input bytes | Output chars | Padding |
|---|---|---|
| 3 | 4 | None |
| 2 | 4 | One = |
| 1 | 4 | Two == |
ASCII "Hi" (2 bytes) becomes "SGk=". Try it in Base64 Encode / Decode and you'll see exactly that.
The URL-safe variant (Base64URL)
The standard alphabet's + and / mean other things in URLs (+ is space, / is the path separator). So RFC 4648 §5 defines a URL-safe alphabet:
+→-/→_=padding is usually dropped so the string is safe to embed in a URL as-is
JWT's three segments use exactly this variant. If you see + or /, it's standard Base64. If you see - or _, or a length that isn't a multiple of four without =, it's Base64URL. Base64 Encode / Decode auto-detects both.
The size overhead in practice
4/3× sounds small, but inlining a megabyte image as a data URI bloats the page by 33% and tanks gzip compression — already-compressed bytes barely shrink after Base64.
So the usual rule of thumb:
- Small icons (under ~5 KB) are fine as data URIs — fewer HTTP requests outweighs the overhead.
- Large images stay as files on a CDN. There's no reason to inline them.
- For small binary blobs (signatures, hashes) in JSON, Base64 is 33% shorter than hex. Compare in Hex Encode / Decode.
Common use cases
1. Data URIs
data:image/png;base64,iVBORw0KGgo... shows up in CSSbackground-image, HTML <img src>, and email attachments. Image to Base64 (Data URI) converts a file to a data URI.
2. JWT
As discussed in the JWT guide, the header and payload are Base64URL- encoded JSON. JWT Decoder unpacks them.
3. Basic auth header
Authorization: Basic dXNlcjpwYXNzThat's "user:pass" in Base64. Not encryption — HTTPS is mandatory. Effectively plaintext.
4. Keys in env vars
TLS private keys, GCP service-account JSON, anything multi-line gets wrapped in Base64 to fit a single-line environment variable. Standard pattern in CI/CD secret managers.
5. URL tokens
Session tokens, invite links, CSRF tokens — Base64URL lets you embed arbitrary random bytes in a URL. It's easy to confuse with URL Encode / Decode, but the two solve different problems — URL-encoding escapes special chars as %XX, while Base64URL turns arbitrary bytes into text.
Pitfalls
1. Not encryption
Anyone can decode a Base64 string in seconds. "I Base64'd it" hides nothing from anyone who recognizes the alphabet. If you need secrecy, encrypt first (AES, etc.) then Base64 the ciphertext.
2. Newlines and whitespace
RFC 4648 is a single unbroken string. But PEM (TLS certificates) wraps every 76 characters. If you don't strip whitespace before decoding, some libraries error out, others produce garbage.
3. Missing padding
A Base64URL token with padding stripped fed to a strict decoder errors with "invalid length". Append = back to make the length a multiple of four, or use a decoder that tolerates missing padding.
4. Character encoding
Encode Korean (or any non-ASCII text) as UTF-8 bytes before Base64. JavaScript's btoa() only handles Latin-1 and throws on non-ASCII. The standard recipe is btoa(unescape(encodeURIComponent(text))) or, more modern, new TextEncoder().encode(text) then Base64.
5. base64 vs MIME base64
RFC 2045 (MIME) is yet another variant that mandates newlines. Python's base64.b64encode is RFC 4648, but base64.encodebytes is MIME — different output. Spell out which variant you want.
Try it
- Base64 Encode / Decode — text/binary ↔ Base64, standard and URL-safe.
- Image to Base64 (Data URI) — image file to data URI.
- JWT Decoder — Base64URL inside JWT, decoded automatically.
Recap
- Base64 = 3 bytes → 4 chars, 33% size growth.
- Standard alphabet plus
=padding. URL-safe variant swaps+/for-_and drops padding. - It's encoding, not encryption — Basic auth and JWT payloads are effectively plaintext.
- In JS, encode non-ASCII text as UTF-8 first —
btoa()alone breaks. - Don't Base64 large binaries — file/CDN is better. 4/3× overhead and lost compression add up fast.