OCR PDF — Make Scanned PDFs Searchable
Make scanned PDFs searchable and selectable. Runs entirely in your browser.
▸▾OCR is CPU-intensive
Recognition runs on your device — not a server. Expect a few seconds per page. For large batches, process in smaller groups. Speed depends on your device.
Drop files here
or click to browse · paste from clipboard
Accepts .PDF · Up to 1,000 files
How it works
Drop your files
Drag and drop, click to browse, or paste from clipboard. Up to 1,000 files at once.
Choose settings
Adjust quality, format, and other options to match your needs.
Click Convert
Everything runs in your browser via WebAssembly. OCR PDF — Make Scanned PDFs Searchable happens locally — no server involved.
Download
Download files individually or grab all at once as a ZIP.
Frequently asked questions
PDFs made from scans — where the pages are images with no selectable text. Common sources: documents photographed and exported to PDF, fax-to-PDF output, or PDFs exported from a scanner. If you can already select text in your PDF, it doesn't need OCR.
Clean, high-contrast printed text on a white background: very high accuracy (95%+). Handwriting: not supported — Tesseract is designed for printed text only. Faded, skewed, or low-resolution scans: accuracy drops. The tool works best with 300 DPI or higher scans.
OCR runs on your device's CPU rather than a server. The tradeoff is that your document never leaves your browser. Most tools here are instant because they use optimised WASM libraries. OCR is computationally heavier — a few seconds per page is normal.
No. In Searchable PDF mode, the original scan is preserved exactly. An invisible text layer is added — you can't see it, but search tools and screen readers can find the text.
Searchable PDF keeps the original document layout and adds a hidden text layer. Extract text strips the PDF entirely and outputs just the recognised words in a plain .txt file. Use "Extract text" when you only care about the content, not the layout.
Yes. The tool adds an OCR text layer regardless of whether the PDF already has text. Pages with existing text will have a duplicate layer — this is harmless but unnecessary. For mixed PDFs (some pages scanned, some already text), the result is still usable.
Yes. Always select the language that matches your document. The wrong language model will produce garbled output even on a clean scan. For a document that mixes two languages, pick the dominant one.
No. OCR runs entirely in your browser using Tesseract.js and WebAssembly. Your PDF never leaves your device. Language model files (~10–15 MB per language) are downloaded from a public CDN on first use and cached — this is the only network request during processing.