Getting Better Extraction
The AI does well on most documents, but you can significantly improve results with these practices.
Document quality
Image resolution
Higher resolution = better OCR = better extraction.
| Resolution | Quality |
|---|---|
| 150 DPI | Minimum. May miss fine text |
| 300 DPI | Good. Recommended for most documents |
| 600 DPI | Excellent. Best for small text or detailed forms |
Scan quality
- Even lighting — avoid shadows across the document
- Flat surface — curled or folded pages reduce accuracy
- Full document — don't cut off edges; include the entire page
- Contrast — dark text on white background works best
Native vs. scanned PDFs
If you have the choice, native (digitally created) PDFs are always better than scanned copies:
- Native PDF — text is embedded, extraction is near-perfect
- Scanned PDF — requires OCR, quality depends on scan quality
Template strategies
Start broad, then narrow
- Run auto-detect on your first document
- See what the AI finds naturally
- Build a template focusing on the fields you need
- Add descriptions for fields the AI got wrong
Test before batch processing
Always test on 2-3 sample documents before processing a large batch. This catches template issues early.
When to use High Precision
- Complex multi-column layouts
- Documents with mixed handwritten and printed text
- Forms with very small text
- When standard mode misses specific fields
Document types that work best
| Document type | Expected accuracy |
|---|---|
| Typed invoices/receipts | Very high |
| Contracts and agreements | Very high |
| Government forms | High |
| Medical records | High |
| Handwritten notes | Moderate |
| Low-resolution photos | Variable |
When extraction fails
If the AI consistently gets poor results:
- Check the raw text — is the OCR reading the document correctly?
- Check your template — are field names and descriptions clear?
- Try High Precision — the slower mode may handle the layout better
- Improve source quality — can you get a higher resolution scan?
- Simplify your template — fewer, more specific fields often work better than many broad ones