Open logo.jpg in a text editor and the first line shows ÿØÿà-ish glyphs. image.png begins with 89 50 4E 47. Those are magic numbers — self-identifying signatures baked into each file format. Operating systems, browsers, and libraries use them to identify file types regardless of the extension. This guide walks through the common ones, why extensions can't be trusted, and the security implications of content sniffing.
Why magic numbers?
File extensions (.jpg, .pdf) are suggestions — anyone can rename:
mv malware.exe vacation.jpg
→ Extension says .jpg but the contents are an executable
Windows that runs based on extension → security incidentMagic numbers are honest signatures inside the file. Ignore the extension, read the bytes.
Common magic numbers
| File type | First bytes (hex) | ASCII / meaning |
|---|---|---|
| PNG | 89 50 4E 47 0D 0A 1A 0A | ‰PNG\r\n\x1a\n |
| JPEG | FF D8 FF | (SOI marker) |
| GIF | 47 49 46 38 37 61 / 47 49 46 38 39 61 | GIF87a / GIF89a |
| WebP | 52 49 46 46 ?? ?? ?? ?? 57 45 42 50 | RIFF....WEBP |
| AVIF | ?? ?? ?? ?? 66 74 79 70 + 'avif' at offset 8 | ...ftypavif |
25 50 44 46 2D | %PDF- | |
| ZIP / DOCX / XLSX / APK | 50 4B 03 04 | PK.. |
| RAR | 52 61 72 21 1A 07 | Rar!\x1a\x07 |
| gzip | 1F 8B | |
| MP3 | FF FB / 49 44 33 | (or ID3 tag) |
| MP4 / MOV | ?? ?? ?? ?? 66 74 79 70 at offset 4 | ...ftyp... |
| Windows EXE / DLL | 4D 5A | MZ (Mark Zbikowski) |
| ELF (Linux executable) | 7F 45 4C 46 | \x7fELF |
| Mach-O (macOS exec) | FE ED FA CE / FE ED FA CF | |
| Java class | CA FE BA BE | (famous) |
Inspect — Hex Encode / Decode reveals hex bytes of any file. Unix file also works:
$ file mystery.bin
mystery.bin: PNG image data, 1920 x 1080, 8-bit/color RGBA
$ xxd mystery.bin | head -1
00000000: 8950 4e47 0d0a 1a0a 0000 000d 4948 4452 .PNG........IHDRContainers within containers
ZIP-based — DOCX / XLSX / APK / JAR
50 4B 03 04 ... ← ZIP magic
DOCX:
- Extension: .docx
- Actually a ZIP wrapping XML files
- Unzip and you'll see word/document.xml etc.
APK (Android):
- ZIP + AndroidManifest.xml + classes.dex
JAR (Java):
- ZIP + META-INF/MANIFEST.MFInspect with file or unzip -l. A corrupted DOCX can sometimes be unzipped to recover the inner XML.
RIFF — WebP / WAV / AVI
RIFF magic: 52 49 46 46 (4 bytes) + size (4 bytes) + form type (4 bytes)
WAV: RIFF....WAVE
AVI: RIFF....AVI
WebP: RIFF....WEBPRIFF is a generic container — the 5-8th bytes specify which actual format (audio / video / image) is inside.
MIME type — magic numbers' friend on the web
The server's Content-Type response header. The browser uses it to decide how to render:
Content-Type: image/png → render in <img>
Content-Type: text/html → parse as HTML
Content-Type: application/pdf → PDF viewer
Content-Type: application/octet-stream → download promptWhen servers send the wrong MIME, browsers may "sniff" the magic bytes and guess. That's where security bugs live.
MIME sniffing pitfalls
Server: Content-Type: text/plain
Actual content: starts with <html>
Old IE / Chrome: "looks like HTML" → renders as text/html
→ A user-uploaded .txt rendered as HTML
→ XSS via user inputFix — send X-Content-Type-Options: nosniff. The browser disables sniffing and trusts the declared MIME. Default for modern sites.
File upload validation — never trust the extension alone
A form that only accepts .jpg:
// Bad — extension only
if (!file.name.endsWith(".jpg")) reject();
// Attacker: malware.exe → malware.jpg
// Better — check magic number
const buffer = await file.slice(0, 4).arrayBuffer();
const bytes = new Uint8Array(buffer);
if (bytes[0] !== 0xFF || bytes[1] !== 0xD8 || bytes[2] !== 0xFF) {
reject("Not a JPEG");
}
// Best — combine, then actually decode with a real image library
sharp(file).metadata() // throws if not a valid imageEven magic numbers aren't a 100% guarantee — polyglot files (valid as PHP and JPG) exist. For high-security contexts, decode inside a sandbox.
Interesting magic-number history
PNG's deliberate signature
89 50 4E 47 0D 0A 1A 0A
│ └──── "PNG" ────┘
│ │
│ ├─ CR LF (Windows newline)
│ ├─ 1A = DOS EOF marker
│ └─ LF (Unix newline)
│
└─ 0x89 = high-bit set, breaks "is this text?" assumption
Purpose — detect newline-corruption immediatelyOld FTP clients mangled CR/LF when transferring "text." PNG's magic catches that corruption on the very first byte.
Java's CAFEBABE
CA FE BA BE. The Java team at the original Sun Microsystems café made it spell something memorable. No deeper meaning — just 4 bytes you can pronounce.
EXE's MZ
Named after Mark Zbikowski, a Microsoft engineer. Goes back to MS-DOS 1.0 (1983). Modern Windows .exe files still start with MZ (DOS stub + PE header follow).
Detection libraries
- libmagic (backend of Unix
file) — thousands of patterns. Beyond the first bytes, it inspects structure. - file-type (Node) — magic + extension combined. Works on Buffer or stream.
- python-magic — Python bindings to libmagic.
Base64 magic — when image previews "begin" with the same bytes
Data URI:
data:image/png;base64,iVBORw0KGgoAAAA...
│
└─ Decode "iVBORw0K..." and you get the PNG magic
"89 50 4E 47 ..." back
Browser handling a Data URI:
1. Base64 decode
2. Check first bytes (or trust the mime)
3. Render as <img>Image to Base64 (Data URI) converts files ↔ Data URIs. The base64 prefix you see is the magic bytes encoded in base64.
Common pitfalls
1. UTF-8 BOM colliding with detection
EF BB BF ← UTF-8 BOM
↓
A naive detector reads the BOM and is confused — "is this PDF (25 50 44 46)?"
No, but the leading EF can derail simple comparisons.2. ZIP-based formats blur together
DOCX / XLSX / APK / JAR all share PK.. magic. Identifying the actual type means looking inside. file does this for you.
3. Polyglot files
The same bytes valid as two formats:
GIF/JS polyglot:
GIF89a/*...*/=1;script=...
↓
- GIF parser: valid 1×1 GIF
- JS parser: valid JavaScript
→ uploaded as "image", loaded via <script src> → XSSDefense — Content-Type + nosniff + sandbox decoding.
4. SVG has no magic
SVG is XML text. The "magic" is just <?xml or <svg. Binary-style detection doesn't apply, and SVG can carry <script> — an XSS risk.
5. ImageIO trusting extensions
Java / .NET image libraries that guess the type from the file name can mis-decode when the actual type differs. Validate the magic before passing to the library.
References
- List of file signatures — Wikipedia
- libmagic — Linux man
- PNG specification (magic explanation) — W3C
- MIME sniffing spec — WHATWG
Summary
- A magic number is the file's self-identifying signature. Trust it over the extension.
- PNG (89 50 4E 47), JPEG (FF D8 FF), PDF (%PDF-), ZIP (PK\x03\x04), MZ (EXE), CAFEBABE (Java class), and more.
- ZIP-based formats (DOCX/XLSX/APK/JAR) share PK magic and need inner inspection to distinguish.
- RIFF is a generic container for WebP / WAV / AVI.
- MIME sniffing is a security risk — set
X-Content-Type-Options: nosniff. - File-upload validation = extension + magic + sandbox decode.
- Polyglot files (GIF + JS) exist — magic alone isn't enough.
- Try it: Hex Encode / Decode to read raw bytes; Image to Base64 (Data URI) turns Data URIs back into magic-prefixed buffers.