Skip to content
yutils

How Image Compression Actually Works (PNG, JPG, WebP, AVIF)

What's inside a JPEG, why PNG is lossless, how WebP and AVIF squeeze further without visible loss. Discrete cosine transform, Huffman coding, palette indexing, and the trade-offs each format makes.

~9 min read

Save the same photo as PNG and it's 5 MB. As JPEG, 500 KB. As WebP, 300 KB. As AVIF, 200 KB. Same pixel count, similar quality — where does the difference come from? This guide walks through what each format actually does inside — Huffman coding, the discrete cosine transform, palette indexing, and the tricks that exploit the human visual system.

Start with raw — how big is a pixel?

A 1920 × 1080 image stored uncompressed:

1920 × 1080 pixels × 3 bytes (RGB) = 6,220,800 bytes ≈ 5.93 MB

BMP and uncompressed TIFF are roughly this size. Every compression format's job is to lose as little perceived quality as possible while shrinking that number.

Lossless vs lossy — the core fork

  • Lossless — decoding produces byte-identical pixels. Right for metadata, screenshots, diagrams, and anything where small artifacts matter. PNG / GIF / WebP-lossless / AVIF-lossless / TIFF.
  • Lossy — throws away information humans don't easily notice. Right for photographs and natural images. JPEG / WebP-lossy / AVIF / HEIC.

"Quality" is just a knob on how much to throw away. JPEG at quality 70 vs 95 is a big file-size delta with little visible difference.

PNG — lossless palette + DEFLATE

Born in 1996 to dodge GIF licensing. Three ingredients:

1. Palette indexing (optional)

If the image uses 256 or fewer colors (screenshots, icons, logos), each pixel can be a 1-byte palette index instead of 3 bytes of RGB:

Palette (up to 256 colors):
  [0] = #FFFFFF
  [1] = #000000
  [2] = #0066CC
  ...
  [255] = #FF0000

Image data:
  one byte per pixel → about 1/3 the size

Photos have millions of unique colors — palette mode doesn't apply. Diagrams and screenshots can be massively smaller in PNG-8 (palette mode) than PNG-24.

2. Filters (prediction)

Each scanline gets a 1-byte filter type. Instead of storing raw pixel values, store the difference from a predictor:

  • None — store as-is
  • Sub — difference from the left pixel
  • Up — difference from the pixel above
  • Average — difference from the (left + above) / 2
  • Paeth — weighted blend of left + above + top-left

For gradients or large flat regions, those differences are mostly zero or small numbers — exactly what DEFLATE compresses well.

3. DEFLATE

Same algorithm as ZIP — LZ77 (back-references to repeating patterns) + Huffman coding (variable-length codes by frequency):

Compressing "AAAABBBCCCCAAAA":
LZ77:    A×4 BBB C×4 [copy from offset 12, length 4]
Huffman: common A → short bits, rare chars → long bits

Net result — PNG wins when the image has simple colors (palette), gradients (filters), or large flat regions (DEFLATE). For photographs, PNG is always larger than JPEG.

JPEG — DCT + human visual tricks

Standardized in 1992. The dominant lossy photo format. Five-stage pipeline:

1. RGB → YCbCr color space

Humans are far more sensitive to brightness (Y) than to color (Cb / Cr). Separating them lets the encoder compress Cb and Cr more aggressively without perceived loss.

2. Chroma subsampling

4:4:4 — full Y, Cb, Cr (no subsampling)
4:2:2 — Cb/Cr at half horizontal resolution → 33% saved
4:2:0 — Cb/Cr at half horizontal and vertical → 50% saved (most common)

Most JPEGs use 4:2:0. Small color details get fuzzy but humans rarely notice.

3. 8×8 blocks → DCT

Split the image into 8×8 pixel blocks. Run each block through a discrete cosine transform — convert pixel values into frequency coefficients:

spatial domain (8×8 pixels):    frequency domain (8×8 DCT):
[ 130 132 128 ... ]    →    [ 1024  -8   2 ... ]
[ 131 133 130 ... ]         [   -3   1   0 ... ]
[ 130 131 128 ... ]         [    0   0   0 ... ]
[ ...           ]           [ ...               ]
                             ↑
                             top-left = average (DC), high frequency to bottom-right

DCT itself loses nothing — same information, different representation. The trick is that natural photos pile most of their energy into the low-frequency (top-left) coefficients, so we can throw away the high-frequency ones cheaply.

4. Quantization — the actual loss

Divide each DCT coefficient by an entry in the quantization table, then round to an integer:

original DCT:  1024  -8   2   1   0   0   0   0
Q-table:         16  11  10  16  24  40  51  61
after divide:    64  -1   0   0   0   0   0   0
                       ↑ most coefficients become 0 ↑

The quality slider just scales the Q-table. Q=90 → small table values (few zeros, big file). Q=30 → big table values (many zeros, small file, visible blockiness).

5. Zigzag + Huffman

Flatten the 8×8 coefficients into a 1D zigzag (low-frequency first), then run-length encode the trailing zeros and Huffman encode the rest.

Hands-on — Image Resize & Compress lets you slide JPEG quality and see file size update live. Q=70 is often half the size of Q=95 with no visible difference.

WebP — VP8 + a new lossless

Google, 2010. Two modes:

  • WebP lossy — reuses the intra-frame compression from the VP8 video codec. 25–35% smaller than JPEG.
  • WebP lossless — about 25% smaller than PNG. Uses predictors + LZ77 + Huffman + color transforms.

What makes WebP-lossy better than JPEG:

  • More flexible transforms (e.g. ADST) and bigger blocks (16×16)
  • Arithmetic coding instead of Huffman
  • Intra-block prediction — guess this block from its neighbor and only store the difference
  • Loop filter smooths block boundaries — no JPEG-style blocky artifacts

WebP also supports an alpha channel in lossy mode, which beats translucent PNG on size. A solid default for modern web images.

AVIF — AV1 video codec's cousin

Standardized in 2019. AV1 intra-frame:

  • About 50% smaller than JPEG at equivalent quality
  • 20–30% smaller than WebP
  • 10/12-bit color, HDR, wide gamut
  • Chroma subsampling options (4:4:4 available)
  • Efficient alpha channel

Downsides:

  • Encoding is slow — 50–100× JPEG time
  • Browser support arrived later than WebP (all major browsers OK as of 2024)
  • For very small images (≤1 KB), header overhead can make AVIF larger

HEIC — Apple's photo format

Default for iPhone photos (iOS 11+). HEVC (H.265) intra-frame. Similar efficiency to AVIF, but licensing kept it out of web standards — mostly an Apple-ecosystem story.

Moving iPhone photos to a Windows machine often forces a JPG conversion to maintain compatibility.

Real numbers — the same photo

FormatModeSize (1920×1280 photo)Quality
BMP / uncompressed~7 MBFull
PNGLossless~4–5 MBFull
JPEG Q=95Lossy~800 KBNear-pristine
JPEG Q=70Lossy~250 KBWeb-grade
WebP Q=80Lossy~180 KB≈ JPEG Q=85
AVIF Q=70Lossy~100 KB≈ WebP

When to use what

  • Screenshots, diagrams, logos (≤ 256 colors) — PNG-8 palette. Lossy modes just add noise.
  • Screenshots and UI captures (full color) — PNG-24 or WebP-lossless. WebP-lossless usually ~25% smaller.
  • General photos (blogs, social) — WebP Q=80 or JPEG Q=85.
  • High-quality photos (portfolio, print) — JPEG Q=95+. PNG would be huge.
  • Modern web only — AVIF first, WebP fallback, JPEG ultimate fallback via <picture>.
  • Icons / pixel-exact colors (games, app UI) — PNG. No loss.
  • Animation — GIF (legacy) / WebP / APNG / AVIF. WebP is roughly 1/3 of GIF.

Common pitfalls

1. Re-saving JPEGs

Open a JPEG, edit, save, open again, edit, save — each step loses more even at the "same" quality. Keep an original PNG/RAW and export to JPEG only as the last step.

2. PNG for photographs

Teams "play it safe" by saving photos as PNG and end up with files 10× larger than they need. Photos belong in lossy.

3. JPEG with alpha

JPEG has no transparency. Saving a translucent logo as JPEG flattens it onto a black/white background. Use PNG / WebP / AVIF for translucency.

4. Progressive vs baseline JPEG

Progressive JPEGs render gradually as the file streams — smoother experience on slow mobile. Same quality means roughly the same size.

5. EXIF dropped

Tools or CDN optimizers can strip EXIF, including orientation metadata — your photo arrives sideways. Use imagemin-jpegtran with --copy=all when that matters.

References

Summary

  • A raw 1920×1080 image is ~6 MB. Every format chases a smaller number from there.
  • PNG is lossless via palette + filters + DEFLATE. Best for screenshots, logos, diagrams.
  • JPEG is the photo king from 1992 — RGB → YCbCr → DCT → quantize → Huffman.
  • WebP is ~25–35% smaller than JPEG, with a lossless mode that beats PNG too.
  • AVIF cuts JPEG by ~50% at the cost of slow encoding. The next web default.
  • Pick the format by content type (photo / diagram / logo) + compatibility + encoding-time budget.
  • Play with it: Image Resize & Compress lets you flip formats and quality side-by-side. Image to Base64 (Data URI) for inlining small images.
Back to guides