URL Encoding Deep Dive — Why %20 Not + and When encodeURIComponent Beats encodeURI

How do you put Korean characters, spaces, or special symbols inside a URL? The answer is percent-encoding (URL encoding). This guide covers which characters are safe, which must be encoded, why a space is sometimes %20 and sometimes +, and the difference between JS's two encoders (encodeURIand encodeURIComponent).

RFC 3986 character classes

Two groups of characters can appear unencoded.

Unreserved (no encoding needed)

A-Z  a-z  0-9  -  _  .  ~

Always safe. Encoding them is harmful — URL normalization will often decode them back, breaking signatures.

Reserved (encode if used as data)

gen-delims:  :  /  ?  #  [  ]  @
sub-delims:  !  $  &  '  (  )  *  +  ,  ;  =

These carry structural meaning. The same character means different things at different positions.

? in a path looks like the start of a query — must be encoded.
& inside a query value looks like the next parameter.
# anywhere starts a fragment.

Everything else (encoded)

Korean, Chinese, emoji, spaces, control bytes — all percent-encoded. First convert to UTF-8 bytes; encode each byte as %HH.

Example: 한 in UTF-8 is EC 95 9C, so it becomes %EC%95%9C.

Drop any string into URL Encode / Decode to see encode and decode side by side — useful for understanding which bytes appear.

Spaces — `%20` vs `+`

The infamous confusion.

Path: spaces become %20. + is just a literal plus.
Query string (application/x-www-form-urlencoded): spaces become +. %20 still works, but + is the form encoding standard.
Fragment: %20.

So "hello world" can appear as:

https://example.com/hello%20world (path)
https://example.com/?q=hello+world (form-style query)
https://example.com/?q=hello%20world (RFC 3986 query — safer)

All decode the same way. But encodeURIComponent("hello world") always emits %20 — if you need +, replace manually.

The three JS functions

`encodeURI`

Encodes a whole URL while preserving structural characters (:/?#[]@!$&'()*+,;=). Use to clean up a URL you already have.

encodeURI("https://example.com/검색?q=하늘")
// "https://example.com/%EA%B2%80%EC%83%89?q=%ED%95%98%EB%8A%98"

? and = are kept; only the Korean is encoded.

`encodeURIComponent`

Encodes a single piece (a path segment, a query value). Encodes the structural characters too.

encodeURIComponent("https://example.com/검색?q=하늘")
// "https%3A%2F%2Fexample.com%2F%EA%B2%80%EC%83%89%3Fq%3D%ED%95%98%EB%8A%98"

Required when embedding a URL inside another URL's query value (OAuth redirect, link shorteners).

Rule of thumb

Building a query value → encodeURIComponent.
Sanitizing a complete URL → encodeURI.
Assembling URLs → URL + URLSearchParams (the safest path).

const url = new URL("https://example.com/search");
url.searchParams.set("q", "hello world & more");
url.searchParams.set("page", "2");
url.toString();
// "https://example.com/search?q=hello+world+%26+more&page=2"

URLSearchParams uses form-style encoding, so spaces become +. Standard, predictable.

Internationalized domains — IDN / Punycode

Domain names must be ASCII. IDNs like 한글.kr are converted to Punycode.

한글.kr  →  xn--bj0bj06e.kr
münchen.de  →  xn--mnchen-3ya.de
example.한국  →  example.xn--3e0b707e

Browsers may show the IDN in the address bar, but DNS resolution and the HTTP Host header always use Punycode. Punycode (IDN) converts both ways.

IDN homograph attack — Cyrillic а (U+0430) looks identical to Latin a (U+0061). аpple.com (with a Cyrillic a) can impersonate Apple. Browsers force display in Punycode when scripts mix suspiciously.

The five parts of a URL

https://user:pass@host.example.com:8080/path/to/resource?key=value#fragment
─┬───  ──┬──── ─┬─────────────── ─┬── ─┬──────────────── ─┬──────── ─┬─────
 │       │      │                  │    │                  │          └ fragment
 │       │      │                  │    │                  └ query
 │       │      │                  │    └ path
 │       │      │                  └ port
 │       │      └ host (authority)
 │       └ userinfo (deprecated for HTTP)
 └ scheme

Each section has its own encoding rules. URL Parser splits the parts so you can pinpoint where encoding broke.

Common pitfalls

1. Double encoding

Calling encodeURIComponent on something that's already encoded turns % into %25. The receiver decodes once and ends up with broken text.

2. Decoding `+` as space everywhere

a+b in a path is just plus. a+b in a form-style query is a space. decodeURIComponent leaves plus alone — for form decoding, do str.replace(/\+/g, " ") first.

3. String-joining the base path

baseUrl + "/" + userInput is exposed to ?, #, and ../ from user input (path traversal, query injection). Use new URL(userInput, baseUrl).

4. Raw Unicode in `fetch`

fetch("https://example.com/한글") works in browsers because they auto-encode, but server-side fetches (Node) depend on the runtime. Always encode explicitly.

5. Base64 collides with URLs

Base64 emits + and /, both URL-significant. For URLs, use base64url (+ → -, / → _). JWT does exactly that.

Safe assembly example

function buildSearchUrl(base, query, page) {
  const url = new URL(base);
  url.pathname = `${url.pathname.replace(/\/$/, "")}/search`;
  url.searchParams.set("q", query);
  url.searchParams.set("page", String(page));
  return url.toString();
}

buildSearchUrl("https://example.com/", "korean text & symbols", 2)
// "https://example.com/search?q=korean+text+%26+symbols&page=2"

Zero string concatenation. URL + URLSearchParams handles every edge case.

Summary

Only unreserved (A-Za-z0-9-._~) characters can stay raw. Everything else gets percent-encoded.
Spaces: %20 in paths and fragments, + in form-style query strings.
Query value → encodeURIComponent; full URL polish → encodeURI; assembly → URL + URLSearchParams.
Non-ASCII domains use Punycode. Beware IDN homograph attacks.
Double encoding, + context confusion, and base64+URL collisions are the perennial bugs.