Skip to main content

Getting Better Extraction

The AI does well on most documents, but you can significantly improve results with these practices.

Document quality

Image resolution

Higher resolution = better OCR = better extraction.

ResolutionQuality
150 DPIMinimum. May miss fine text
300 DPIGood. Recommended for most documents
600 DPIExcellent. Best for small text or detailed forms

Scan quality

  • Even lighting — avoid shadows across the document
  • Flat surface — curled or folded pages reduce accuracy
  • Full document — don't cut off edges; include the entire page
  • Contrast — dark text on white background works best

Native vs. scanned PDFs

If you have the choice, native (digitally created) PDFs are always better than scanned copies:

  • Native PDF — text is embedded, extraction is near-perfect
  • Scanned PDF — requires OCR, quality depends on scan quality

Template strategies

Start broad, then narrow

  1. Run auto-detect on your first document
  2. See what the AI finds naturally
  3. Build a template focusing on the fields you need
  4. Add descriptions for fields the AI got wrong

Test before batch processing

Always test on 2-3 sample documents before processing a large batch. This catches template issues early.

When to use High Precision

  • Complex multi-column layouts
  • Documents with mixed handwritten and printed text
  • Forms with very small text
  • When standard mode misses specific fields

Document types that work best

Document typeExpected accuracy
Typed invoices/receiptsVery high
Contracts and agreementsVery high
Government formsHigh
Medical recordsHigh
Handwritten notesModerate
Low-resolution photosVariable

When extraction fails

If the AI consistently gets poor results:

  1. Check the raw text — is the OCR reading the document correctly?
  2. Check your template — are field names and descriptions clear?
  3. Try High Precision — the slower mode may handle the layout better
  4. Improve source quality — can you get a higher resolution scan?
  5. Simplify your template — fewer, more specific fields often work better than many broad ones