Skip to content
yutils

Base64 Encoding — How It Works and When to Use It

Base64 is binary-to-text encoding, not encryption. Understand padding, URL-safe variants, common pitfalls, and the encoded size overhead.

~7 min read

Base64 is a way to represent binary data as 64-character ASCII text. Email attachments, JWT headers and payloads, data URIs, certificates stuffed into environment variables — almost any time you need to squeeze binary through a text-only channel, you'll see Base64. The name suggests cryptography, but it is not encryption at all. This guide covers how it works, its variants, the size overhead, and the pitfalls that catch most teams.

Why we need encoding

Many protocols and formats only safely handle text. SMTP carries decades of 7-bit ASCII legacy, JSON cannot embed arbitrary bytes directly. To send an image or a key over these channels, you need to turn the bytes into text.

Base64 is standardized in RFC 4648. It takes three 8-bit bytes (24 bits), chops them into four 6-bit groups, and maps each 6-bit group to one of 64 characters — hence the name. Three input bytes become four output characters, a 4/3× expansion (33% overhead).

Alphabet and padding

The standard alphabet is these 64 characters:

A-Z   a-z   0-9   +   /

When the input length isn't a multiple of three, the output is padded with = to reach a multiple of four characters.

Input bytesOutput charsPadding
34None
24One =
14Two ==

ASCII "Hi" (2 bytes) becomes "SGk=". Try it in Base64 Encode / Decode and you'll see exactly that.

The URL-safe variant (Base64URL)

The standard alphabet's + and / mean other things in URLs (+ is space, / is the path separator). So RFC 4648 §5 defines a URL-safe alphabet:

  • +-
  • /_
  • = padding is usually dropped so the string is safe to embed in a URL as-is

JWT's three segments use exactly this variant. If you see + or /, it's standard Base64. If you see - or _, or a length that isn't a multiple of four without =, it's Base64URL. Base64 Encode / Decode auto-detects both.

The size overhead in practice

4/3× sounds small, but inlining a megabyte image as a data URI bloats the page by 33% and tanks gzip compression — already-compressed bytes barely shrink after Base64.

So the usual rule of thumb:

  • Small icons (under ~5 KB) are fine as data URIs — fewer HTTP requests outweighs the overhead.
  • Large images stay as files on a CDN. There's no reason to inline them.
  • For small binary blobs (signatures, hashes) in JSON, Base64 is 33% shorter than hex. Compare in Hex Encode / Decode.

Common use cases

1. Data URIs

data:image/png;base64,iVBORw0KGgo... shows up in CSSbackground-image, HTML <img src>, and email attachments. Image to Base64 (Data URI) converts a file to a data URI.

2. JWT

As discussed in the JWT guide, the header and payload are Base64URL- encoded JSON. JWT Decoder unpacks them.

3. Basic auth header

Authorization: Basic dXNlcjpwYXNz

That's "user:pass" in Base64. Not encryption — HTTPS is mandatory. Effectively plaintext.

4. Keys in env vars

TLS private keys, GCP service-account JSON, anything multi-line gets wrapped in Base64 to fit a single-line environment variable. Standard pattern in CI/CD secret managers.

5. URL tokens

Session tokens, invite links, CSRF tokens — Base64URL lets you embed arbitrary random bytes in a URL. It's easy to confuse with URL Encode / Decode, but the two solve different problems — URL-encoding escapes special chars as %XX, while Base64URL turns arbitrary bytes into text.

Pitfalls

1. Not encryption

Anyone can decode a Base64 string in seconds. "I Base64'd it" hides nothing from anyone who recognizes the alphabet. If you need secrecy, encrypt first (AES, etc.) then Base64 the ciphertext.

2. Newlines and whitespace

RFC 4648 is a single unbroken string. But PEM (TLS certificates) wraps every 76 characters. If you don't strip whitespace before decoding, some libraries error out, others produce garbage.

3. Missing padding

A Base64URL token with padding stripped fed to a strict decoder errors with "invalid length". Append = back to make the length a multiple of four, or use a decoder that tolerates missing padding.

4. Character encoding

Encode Korean (or any non-ASCII text) as UTF-8 bytes before Base64. JavaScript's btoa() only handles Latin-1 and throws on non-ASCII. The standard recipe is btoa(unescape(encodeURIComponent(text))) or, more modern, new TextEncoder().encode(text) then Base64.

5. base64 vs MIME base64

RFC 2045 (MIME) is yet another variant that mandates newlines. Python's base64.b64encode is RFC 4648, but base64.encodebytes is MIME — different output. Spell out which variant you want.

Try it

Recap

  • Base64 = 3 bytes → 4 chars, 33% size growth.
  • Standard alphabet plus = padding. URL-safe variant swaps +/ for -_ and drops padding.
  • It's encoding, not encryption — Basic auth and JWT payloads are effectively plaintext.
  • In JS, encode non-ASCII text as UTF-8 first — btoa() alone breaks.
  • Don't Base64 large binaries — file/CDN is better. 4/3× overhead and lost compression add up fast.
Back to guides