Ocriva Logo

Documents

Examples & Best Practices

Practical template examples and tips for effective document extraction.

templatesexamplesbest-practices

Published: 3/31/2026

Examples & Best Practices

Template Examples

Example 1: Thai Tax Invoice Extraction (JSON)

This template extracts all required fields from a standard Thai tax invoice and returns them as JSON for database storage.

Template Settings:

  • Name: Thai Tax Invoice
  • Result Format: json
  • AI Provider: OpenAI gpt-4o

Schema:

{
  "type": "object",
  "properties": {
    "invoice_number": {
      "type": "string",
      "description": "Invoice number as printed on the document"
    },
    "invoice_date": {
      "type": "string",
      "description": "Issue date in YYYY-MM-DD. Convert Thai B.E. to C.E. by subtracting 543."
    },
    "seller_name": { "type": "string", "description": "Seller company name" },
    "seller_tax_id": { "type": "string", "description": "Seller 13-digit Thai tax ID" },
    "seller_address": { "type": "string" },
    "buyer_name": { "type": "string", "description": "Buyer company or person name" },
    "buyer_tax_id": { "type": "string", "description": "Buyer 13-digit Thai tax ID if present" },
    "line_items": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "description": { "type": "string" },
          "quantity": { "type": "number" },
          "unit_price": { "type": "number" },
          "amount": { "type": "number" }
        }
      }
    },
    "subtotal": { "type": "number", "description": "Amount before VAT" },
    "vat_rate": { "type": "number", "description": "VAT rate as decimal, e.g. 0.07" },
    "vat_amount": { "type": "number" },
    "total_amount": { "type": "number", "description": "Grand total including VAT" },
    "currency": { "type": "string", "description": "Currency, default THB" }
  }
}

Instructions:

You are a Thai tax invoice extraction assistant. Extract all data exactly as shown on the invoice.
Rules:
- Convert all Thai Buddhist Era (B.E.) dates to Christian Era (C.E.) by subtracting 543
- Return all monetary values as plain numbers without commas or symbols
- If a field is absent, return null
- Preserve original Thai company names exactly as written
- Include every line item in the table

TIP

Always test your template with 2–3 sample documents before processing a full batch. This helps catch schema issues and optimize your AI instructions early.

TIP

Before processing real documents, test your template with 3–5 sample invoices that cover different layouts and suppliers. Verify the extracted JSON matches expected values before deploying to a production workflow. This saves credits and avoids bad data entering downstream systems.


Example 2: Expense Receipt Scanner (CSV)

This template scans expense receipts and exports results to CSV for expense report processing.

Template Settings:

  • Name: Expense Receipt
  • Result Format: csv
  • CSV Config: csvColumnsOnly: true
  • AI Provider: Google Gemini gemini-2.0-flash

Schema:

{
  "type": "object",
  "properties": {
    "date": { "type": "string", "description": "Receipt date YYYY-MM-DD" },
    "merchant": { "type": "string", "description": "Store or merchant name" },
    "category": {
      "type": "string",
      "description": "Expense category: food, transport, accommodation, office_supplies, other"
    },
    "amount": { "type": "number", "description": "Total amount paid" },
    "currency": { "type": "string" },
    "payment_method": {
      "type": "string",
      "description": "cash, credit_card, debit_card, promptpay, or other"
    },
    "vat_included": {
      "type": "boolean",
      "description": "True if receipt shows VAT included"
    },
    "vat_amount": { "type": "number" }
  }
}

CSV Column Order:

["date", "merchant", "category", "amount", "currency", "payment_method", "vat_amount"]

Instructions:

Extract expense data from this receipt. Categorize the expense as one of: food, transport, accommodation, office_supplies, or other.
For Thai receipts with PromptPay QR codes, set payment_method to "promptpay".
All amounts in original currency — do not convert.

Example 3: Document Summary (Free Text)

This template generates a plain-text summary of any document, useful for search indexing or quick review.

Template Settings:

  • Name: Document Summary
  • Result Format: text
  • AI Provider: Anthropic claude-sonnet-4-5

Schema: (none required for free text)

Instructions:

Read the entire document and produce a structured summary with these sections:
 
1. DOCUMENT TYPE: Identify what kind of document this is
2. KEY PARTIES: List all people, companies, or organizations mentioned
3. MAIN SUBJECT: One paragraph describing what the document is about
4. KEY DATES: List all significant dates and their meaning
5. IMPORTANT NUMBERS: List all significant monetary amounts, quantities, or reference numbers
6. ACTION ITEMS: Any tasks, obligations, or deadlines mentioned
 
Write in English regardless of the source document language.
Keep the total summary under 400 words.

Best Practices

Schema Design

  • Name fields descriptivelyinvoice_issue_date is clearer than date1.
  • Always include description — Every field should have a description that acts as a mini-instruction for the AI.
  • Use null-friendly types — Avoid requiring fields that may not always be present. Let missing fields return null.
  • Test with edge cases — Run your template against documents that are scanned at an angle, have low resolution, or have unusual layouts.
  • Keep schemas focused — A template with 8 well-defined fields outperforms one with 25 poorly-defined fields.

AI Model Selection

  • Start with gpt-4o-mini or gemini-2.0-flash for cost efficiency during testing.
  • Upgrade to gpt-4o or gemini-2.5-pro for production when accuracy is critical.
  • Use Qwen models for documents primarily in Chinese.
  • Use Kimi moonshot-v1-32k for very long documents (contracts, reports).

Instructions

  • Write instructions in the same language as the document when possible — or explicitly state the document language.
  • Add example values in your instructions when the format is ambiguous: "Return date as YYYY-MM-DD, e.g. 2024-11-15".
  • Keep instructions under 500 words — Very long instructions can confuse the AI.
  • Version your instructions — When you update instructions, note the change date in the template description so you can track which version processed which documents.

Output Formats

  • Use JSON as the default — it is the most flexible and easiest to transform into other formats later.
  • Use CSV only when the schema is flat — Deeply nested schemas produce awkward column names in CSV.
  • Use PDF/DOCX only for human-readable reports, not for automated pipelines.

Testing

  1. Collect 5–10 representative sample documents before deploying a template.
  2. Include edge cases: faded text, handwriting, multi-page documents, missing fields.
  3. Compare AI output against manually extracted data for accuracy measurement.
  4. Monitor processing history for recurring errors and refine instructions accordingly.

Data Files

  • Keep data files small and focused (under 500KB each).
  • Update data files when reference data changes (product catalogs, customer lists).
  • Use .md files for rule documents — Markdown formatting helps the AI parse structured rules.

IMPORTANT

Keep each template focused on a single document type. A template designed to handle both invoices and receipts in one schema will produce lower accuracy than two separate, purpose-built templates. The small overhead of managing additional templates is far outweighed by the improvement in extraction quality.


Summary

TaskWhere to Configure
Define extracted fieldsSchema (JSON Schema format)
Choose AI modelAssistant Config > Provider + Model
Write extraction rulesAssistant Config > Instructions
Set output formatResult Format
Configure CSV columnsResult Format > csvColumnOrder
Add reference documentsData Files
Generate imagesResult Format > image + imageUserOptions

For API integration, see the API Reference for template management endpoints and how to trigger processing programmatically.