Example
Input (PDF + options)
File: report-2026.pdf (12 pages) Mode: per-page (separated by page) Format: Markdown
Output (Markdown)
## Page 1 yutils Usage Analysis Report May 13, 2026 ## Page 2 Summary - Tool entry path: search 65%, favorites 22% - Most used tools: Base64, JSON Formatter, JWT ...
Note
Pulls from the PDF's text layer — scanned or image-only PDFs return empty results (no OCR). Everything runs locally via pdfjs-dist.
Usage / FAQ
When to use
- Convert PDF reports, papers, or specs to Markdown quickly
- Pull PDF text into a searchable / greppable form
- Excerpt PDF content as AI prompt input
- Extract just the pages you need from a long PDF
- Share PDF excerpts in email or Slack
FAQ
- Q.Does it handle scanned or image-only PDFs?
- A.No. Only the PDF text layer is extracted — scanned or photo PDFs need OCR. An empty result usually means the PDF is image-based.
- Q.Is my file uploaded?
- A.No. Parsed locally via pdfjs-dist — both the file and the extracted text stay in your browser. Safe for confidential documents.
- Q.How are tables and figures handled?
- A.Tables flatten into cell-order text (structure isn't preserved). Text inside images can't be extracted without OCR. Complex tables may need manual cleanup.
Fun facts
Text extraction from PDF is hard because PDF is a sequence of rendering commands ('draw this glyph at this coordinate'), not a paragraph structure. Line breaks, paragraphs, and table structure are heuristic guesses from coordinates — different tools give different results from the same PDF.
ISO 32000-1 §7.8 Content StreamsMore PDFs than you'd expect are essentially un-extractable without OCR — scanned images embedded in PDF (old docs, scanner output, re-printed PDFs). Quick check: try to select text in a PDF viewer. If you can't, you need OCR.
Wikipedia — OCRpdfjs-dist (used here) is Mozilla's pure-JS PDF renderer. Firefox's built-in PDF viewer is exactly this — the de-facto Web PDF standard. Released 2011, still actively maintained.
Mozilla pdf.js
Related tools
- JSON Formatter / Validator
Format, validate, and minify JSON strings. Adjust indent and optionally sort keys. Runs entirely in your browser.
- String Case Converter
Convert strings between camelCase, PascalCase, snake_case, kebab-case, CONSTANT_CASE, and Title Case — all six cases shown side-by-side.
- Regex Tester
Test JavaScript regular expressions with live match results. Supports g/i/m/s/u/y flags and capture groups.
- Markdown Preview
Render Markdown to HTML side-by-side. Supports CommonMark + GFM (tables, fenced code, task lists). marked is lazy-loaded.
- HTML → Markdown
Convert HTML into Markdown. Headings, lists, links, code, tables, blockquotes. Uses the browser's DOMParser — accurate, 0 dependency.
- YAML ↔ JSON
Convert between YAML and JSON. Tolerates comments and multiline strings on the YAML side. yaml is lazy-loaded.
- Text Diff
Compare two texts and highlight added/removed lines, words, or characters.
- JSON Diff
Compare two JSON values, with optional key sorting and JSON-aware error messages.
- CSV ↔ JSON
Convert between CSV and JSON. Handles quoted fields, custom delimiters, and header rows.
- SQL Formatter
Format SQL queries with proper indentation and keyword casing. Supports PostgreSQL, MySQL, SQLite, and standard dialects.
- XML Formatter
Pretty-print or minify XML with attribute preservation. Handles SOAP, sitemaps, and config files.
- XML ↔ JSON
Convert between XML and JSON with attribute and element handling.
- Smart Paste
Paste any text and get tool recommendations — JSON, JWT, Base64, URL, UUID, Cron, and 9 more types auto-detected.
- Lorem Ipsum
Generate placeholder text in words, sentences, or paragraphs. Classic Lorem Ipsum or randomized.
- JSON Path
Query JSON with JSONPath expressions ($.store.book[*].author etc.) and inspect matches.
- JSON Schema Validator
Validate JSON data against a JSON Schema (Draft 2020-12). Powered by Ajv with format support.
- JSON Schema Generator
Generate a JSON Schema (Draft 2020-12) from a sample JSON. Infer types, required fields, and nested structures automatically.
- HTML Formatter
Beautify or minify HTML with proper indentation, attribute alignment, and configurable wrap.
- CSS Formatter
Beautify or minify CSS with proper indentation. Configurable selector and property style.
- JavaScript Formatter
Beautify or minify JavaScript with brace style and indent options. Powered by js-beautify.
- TOML ↔ JSON
Convert between TOML (Tom's Obvious Minimal Language) and JSON. Used in Cargo.toml, pyproject.toml, etc.
- INI ↔ JSON
Convert INI configuration to JSON and back. Supports sections, comments (; or #), and key=value.
- JSON → TypeScript
Generate TypeScript interfaces from a JSON sample. Nested objects become separate interfaces.
- JS Object → JSON
Convert a JavaScript object literal (unquoted keys, single quotes, trailing commas, comments) into standard JSON. Lenient parser, strict output.
- Slug Generator
Convert text into a URL-safe slug. Configurable separator, lowercase, and accent stripping.
- ASCII Tree
Convert indented text or path list into a box-drawing tree (├── │ └──).
- Diff Patch
Generate a unified diff (-u) patch from two text inputs. Compatible with `git apply` / `patch -p0`.
- Mock Data
Generate fake JSON records and SQL INSERT seed data — names, emails, custom fields, UUIDs, dates, and more. 0 dependency.
- MongoDB Extended JSON
Convert MongoDB Extended JSON (EJSON) between Canonical and Relaxed forms, or strip BSON wrappers to plain JSON. Recognizes 16 wrapper types ($oid/$date/$numberLong/$numberDecimal/$binary/...).
- Kubernetes YAML Visualizer
Paste Kubernetes manifests and see the resource graph — Deployments, Services, Ingresses, ConfigMaps, Secrets, PVCs, and how they connect. yaml is lazy-loaded.
- Docker Compose Visualizer
Paste docker-compose.yml and see services, networks, volumes, and depends_on as an interactive graph. Client-side, lazy-loaded yaml.
- PPTX Text Extractor
Extract slide text from a .pptx file — plain / markdown / per-slide. Great for converting decks to markdown. All client-side.
- Regex Railroad Diagram
Visualize your regex as a railroad diagram — trace branches, groups, and quantifiers at a glance, in your browser.
- Word & Character Counter
Count characters (with/without spaces), words, sentences, paragraphs, lines, and bytes in real time — free, in your browser.
- LLM Token Counter
Count LLM tokens and estimate cost — exact for GPT (tiktoken), estimated for Claude/Gemini. Context-limit gauge. All in your browser.