JSON's grammar fits on a single page. Yet writing a parser forces design decisions that explain its real-world quirks: why trailing commas are forbidden, why JSON.parse('{"id":9999999999999999}') silently returns 10000000000000000, why streaming parsers exist. This guide walks through the parser internals and the surprises seasoned engineers still hit.
The JSON grammar — five value types
value := object | array | string | number | true | false | null
object := { "key" : value , "key" : value ... }
array := [ value , value ... ]
string := " (escaped chars) "
number := -? digits ( .digits )? ( [eE] [+-]? digits )?That's the whole spec. No functions, variables, comments, or trailing commas. Simplicity is the point — a parser fits in ~100 lines.
Two stages — lexer → parser
Most JSON parsers split into two passes:
1. Lexer (tokenizer) — characters → tokens
input: {"id": 42, "name": "Yu"}
tokens:
LBRACE {
STRING "id"
COLON :
NUMBER 42
COMMA ,
STRING "name"
COLON :
STRING "Yu"
RBRACE }The lexer skips whitespace and groups characters into meaningful tokens via a state machine:
- See
"→ STRING state until the next" - See a digit or
-→ NUMBER state - See
t/f/n→ keyword (true/false/null) state - Anything else → error
2. Parser — tokens → tree
Recursive descent is the typical choice:
parseValue() {
switch (peek()) {
case LBRACE: return parseObject();
case LBRACKET: return parseArray();
case STRING: return consume().value;
case NUMBER: return parseNumber(consume());
case "true": consume(); return true;
...
}
}
parseObject() {
expect(LBRACE);
while (peek() !== RBRACE) {
const key = expect(STRING).value;
expect(COLON);
obj[key] = parseValue();
if (peek() === COMMA) consume();
else break; // ← branch where trailing comma policy lives
}
expect(RBRACE);
return obj;
}Why trailing commas are forbidden
{"a": 1, "b": 2,} ← JSON error
[1, 2, 3,] ← JSON errorWhen Douglas Crockford standardized JSON in RFC 4627 (2006), he chose "minimal grammar." Trailing commas would:
- Add a parser branch —
parseObject()would need an extra check for RBRACE right after COMMA - Risk producing accidentally-empty trailing entries in some serializers (Python
[1,2,]is length 2 vs length 3 confusion) - Trade DX for spec simplicity — and simplicity won
The cost shows up in diffs — adding a line means modifying the previous line to add a comma. JavaScript, Python, Go, and Rust all allow trailing commas. JSON's refusal is the most-frequent complaint.
Workarounds — JSON5 / JSONC allow trailing commas and comments. tsconfig.json is JSONC. Strict JSON still forbids them.
The precision bomb — IEEE 754
The spec has no upper bound on number magnitude:
{"id": 9999999999999999} ← valid per specBut JavaScript's JSON.parse() returns a Number — IEEE 754 double precision. Safe integers only up to ±2^53 (9,007,199,254,740,992). Beyond that, precision is lost:
JSON.parse('{"id":9999999999999999}').id
// 10000000000000000 ← off by 1
Number.MAX_SAFE_INTEGER
// 9007199254740991 (= 2^53 - 1)Twitter's API was bitten early — its snowflake IDs are 64-bit and JavaScript clients silently lost the low digits. Fix: ship IDs as strings:
// Bad
{"id": 1234567890123456789}
// Good
{"id": "1234567890123456789"}Other languages:
- Python —
json.loads()handles arbitrarily large ints. No precision loss. - Go —
json.Unmarshaldefaults to float64. Usejson.Numberto preserve precision. - Java — Jackson supports
BigInteger/BigDecimal.
See it in action — feed a large integer to JSON Formatter / Validator and the tree view shows the precision loss immediately.
BigInt meets JSON
JavaScript got BigInt in 2020. But JSON.stringify(123n) throws — the spec doesn't define BigInt serialization.
Workaround — patch toJSON or use a reviver:
BigInt.prototype.toJSON = function() { return this.toString(); };
JSON.stringify({id: 1234567890123456789n});
// '{"id":"1234567890123456789"}'String escapes — quiet traps
The escapes JSON strings allow:
\" " (double quote)
\\ \ (backslash)
\/ / (slash, optional)
\b backspace
\f form feed
\n newline
\r carriage return
\t tab
\uXXXX Unicode codepoint (4 hex digits)For codepoints > U+FFFF, JSON uses UTF-16 surrogate pairs (e.g. "🎉" → 🎉). Some parsers accept unpaired surrogates and emit invalid UTF-8 — a security risk when the output crosses trust boundaries.
Duplicate keys
{"a": 1, "a": 2} ← valid?RFC 8259 says key names "should" be unique but doesn't require it. Most parsers:
- Take the last value (JavaScript / Python / Go)
- Take the first (some older parsers)
- Preserve all as an array (CouchDB and friends)
Security implication — if a proxy uses one parser and the API uses another with opposite duplicate-key behavior, you've got an auth-bypass primitive. Be deliberate at trust boundaries.
Streaming JSON — memory matters
JSON.parse() reads the entire string in one shot. A 1 GB JSON file needs 1 GB+ of memory. Lambda / Cloud Function limits get hit fast.
Options — streaming parsers emit tokens via callbacks:
- SAX-style — onObjectStart / onKey / onValue callbacks. The caller builds the structure they actually need.
- JSONPath streaming — extract only a specific path. Process items in a huge array one at a time.
JSONStream(Node) /ijson(Python). - JSON Lines (JSONL / NDJSON) — one JSON object per line. Line-by-line streaming is natural. Standard for logs and analytics:
{"user": "alice", "ts": 1700000000}
{"user": "bob", "ts": 1700000001}
{"user": "carol", "ts": 1700000002}
// Each line is its own JSON. No need to load the whole file.MongoDB EJSON — adding types back
BSON (MongoDB's binary format) has ObjectId, Date, Decimal128 — types JSON doesn't model. MongoDB Extended JSON wraps them in marker objects:
{
"_id": { "$oid": "507f1f77bcf86cd799439011" },
"created": { "$date": "2026-05-22T00:00:00Z" },
"price": { "$numberDecimal": "19.99" }
}MongoDB Extended JSON recognizes the 16 wrapper types. The tree view in JSON Formatter / Validator also auto- detects EJSON when the toggle is on.
Common pitfalls
1. JSON.stringify and undefined
JSON.stringify({a: undefined, b: 1}) // '{"b":1}' ← a dropped
JSON.stringify([undefined]) // '[null]' ← coerced to null
JSON.stringify(undefined) // undefined ← the function itself returns undefined2. NaN / Infinity
JSON.stringify({x: NaN}) // '{"x":null}'
JSON.stringify({x: Infinity}) // '{"x":null}'JSON has no representation for NaN or Infinity. They round-trip as null, silently. Preserve them as strings if you need to.
3. Circular references
const a = {};
a.self = a;
JSON.stringify(a); // TypeError: Converting circular structure to JSON4. Date's automatic toJSON
JSON.stringify({d: new Date()})
// '{"d":"2026-05-22T05:30:00.000Z"}'
// Date's toJSON() returns an ISO 8601 string
// But parsing doesn't restore it
JSON.parse('{"d":"2026-05-22T05:30:00.000Z"}').d
// "2026-05-22T05:30:00.000Z" (still a string!)Restore with a reviver:
JSON.parse(str, (k, v) =>
typeof v === "string" && /^\d{4}-\d{2}-\d{2}T/.test(v)
? new Date(v) : v);5. Large-input typing freeze
JSON.parse is synchronous and blocks the main thread. For large inputs, JSON Formatter / Validator debounces past 4 KB (the same pattern from PR #77).
References
- RFC 8259 (JSON standard) — datatracker
- json.org — Crockford's original page
- IEEE 754 double precision — Wikipedia
- MongoDB Extended JSON — official docs
Summary
- JSON has 5 value types + object/array. A working parser is ~100 lines.
- Lexer (state machine) → Parser (recursive descent). Two passes.
- No trailing commas — a deliberate spec-simplicity trade. Use JSON5/JSONC for files that need them.
- Numbers go through IEEE 754 double. Integers past ±2^53 lose precision. Send IDs as strings.
- Duplicate keys are undefined behavior — parser-dependent. Plan for it at trust boundaries.
- Streaming = JSONL / NDJSON. Don't read 1 GB in one parse.
- EJSON layers Date / ObjectId / Decimal128 on top of JSON.
- Try it — JSON Formatter / Validator / JSON Path / MongoDB Extended JSON / JSON → TypeScript / JSON Schema Generator.