Skip to content
yutils
Example

Input

File: report.pdf (2.4 MB, mostly text + tables)
Options: Strip metadata on

Output example (text-heavy, 50 pages)

Source 2.4 MB → Compressed 2.0 MB (16% saved)
Metadata (author, title) removed

Note

Results vary widely by content — text-heavy 5-20%, image-heavy near 0%, already-compressed JPEGs may even grow +1-2%.

Usage / FAQ

When to use

  • Trim a PDF that's just over an email attachment cap (10-25 MB)
  • Clean up inefficient PDFs from legacy tools (re-encoding effect)
  • Strip metadata (author / company / software) before external sharing
  • Diagnose why a PDF is unexpectedly large (estimate via the difference)
  • First step when batch-cleaning many PDFs (then pdf-merge, etc.)

FAQ

Q.Why doesn't this shrink image PDFs?
A.Images inside PDF are already JPEG/JPEG2000-compressed. Real gains need down-sampling or quality reduction — heavy client-side. Server tools (Ghostscript, qpdf) are the right tool.
Q.Does stripping metadata help security?
A.Partially — removes forensic clues like author / company / source software. But other clues (image EXIF, text layout fingerprints) remain. For real sanitization, print-to-PDF + OCR is the canonical path.
Q.Can the output be larger than the input?
A.Yes — for well-compressed PDFs (modern Acrobat output) the naive re-encoding here can slightly inflate. In that case the 'before > after' difference appears negative.
Fun facts
  • PDF's object stream compression entered with PDF 1.5 (2003). Earlier (1.4 and below) every object was its own stream; from 1.5, many objects can be packed into one compressed stream — meaningful size reduction.

    ISO 32000-1 §7.5.7 (Object Streams)
  • 80-95% of a typical PDF's weight is images — text barely contributes. So the real answer for 'shrink my PDF' is image down-sampling. 300dpi → 150dpi alone often halves the size.

    Wikipedia — PDF Compression
  • Ghostscript's -dPDFSETTINGS=/ebook (or /screen) is the canonical compression preset — automatically down-samples images + subsets fonts + applies object streams. The standard CLI trick.

    Ghostscript — PDF settings