Ocriva Logo

Documents

Architecture & Integration

Technical architecture overview and integration patterns for the DAT platform.

architectureintegrationapitechnical

Published: 3/31/2026

Architecture & Integration

Technical Overview

This page describes the technical architecture of the Ocriva Document Automation Transformer (DAT) platform — how it is structured, how documents move through it, and how to integrate with it from external systems.


Platform Architecture

Multi-Tenant Hierarchy

Ocriva uses a multi-tenant hierarchy where each Organization is a fully isolated tenant. No data crosses organizational boundaries.

Organization (tenant)
├── Project A
│   ├── Template: Invoice Extractor (JSON output)
│   ├── Template: Receipt Scanner (CSV output)
│   ├── Webhooks: → accounting-api.example.com
│   └── API Tokens: for accounting system integration
├── Project B
│   ├── Template: Contract Analyzer (Text output)
│   ├── Template: NDA Extractor (JSON output)
│   └── API Tokens: for legal document system
├── Project C
│   ├── Template: Employee Onboarding Pack (JSON output)
│   └── Webhooks: → hris.example.com
└── Billing & Credits
    ├── Stripe subscription
    ├── Credit balance
    └── Usage history

Organization — the top-level tenant. All billing, credits, and team membership are managed here. One organization per company is the common pattern, but multiple organizations per account are supported (useful for agencies managing multiple clients).

Project — logical groupings within an organization. Use projects to separate document use cases, departments, or client accounts. Access tokens and webhooks are configured per project.

Template — extraction configurations within a project. Each template specifies: which AI model to use, what fields to extract, what format to output. Multiple templates per project allow one project to handle different document types.

Application Services

The Ocriva platform runs as a set of services:

ServiceTechnologyResponsibility
API ServerNestJS (Node.js)REST API, business logic, AI orchestration
Web FrontendNext.js 14 (React)User interface, document upload UI
WebSocket ServerNestJS + Socket.IOReal-time event push to clients
CMSNext.js 14Documentation and content management

Processing Pipeline

Every document that enters Ocriva follows this pipeline:

Upload → Queue (pending) → AI Processing (in_progress) → Result (completed/failed)
   ↓                              ↓                              ↓
Webhook:                    Credit deduction              Webhook:
document.uploaded           (on start)                   document.processed
   ↓                                                           ↓
WebSocket:                                            WebSocket:
real-time UI update                                   result push to UI

Step-by-Step

  1. Upload — Client submits document via web UI, REST API, or LINE integration. Document is stored in Supabase Storage or Google Cloud Storage. A processing record is created with status pending.

  2. Queue — The processing record enters the queue. A document.uploaded webhook event fires (if configured). A WebSocket event notifies connected clients.

  3. AI Processing — The queue worker picks up the document. Status changes to in_progress. Credits are deducted. The document is sent to the configured AI provider along with the template's extraction instructions.

  4. Result — The AI returns extracted data. The processing record is updated to completed with the result payload. A document.processed webhook event fires. A WebSocket event pushes the result to connected clients.

  5. Failure handling — If the AI request fails, the record is marked failed with error details. A document.processed (with failed status) webhook fires. Transient failures trigger automatic retry.

NOTE

The processing queue runs on a cron cycle that polls for pending documents approximately every 5 seconds. This means there is a small inherent delay between document upload and the start of AI processing. For time-sensitive workflows, factor this into your latency expectations.

Batch Processing

Batch submissions follow the same pipeline per document, with batch-level coordination:

Batch Submit (N documents)

N individual processing records created

Documents processed in parallel (up to concurrency limit)

WebSocket: per-document status updates streamed live

When all N documents complete: batch.completed webhook fires

Batch export available (combined JSON/CSV)

Integration Patterns

Pattern 1: REST API (Pull-Based)

The simplest integration. Your system calls the Ocriva API to submit documents and retrieve results.

Submit a document:

POST /upload
Authorization: Bearer <api-token>
Content-Type: multipart/form-data
 
file: [binary]
templateId: tmpl_abc123
projectId: proj_xyz

Check processing status:

GET /processing-history/{processingId}
Authorization: Bearer <api-token>

Response:

{
  "id": "proc_abc123",
  "status": "completed",
  "result": {
    "vendor": "Acme Corp",
    "invoice_number": "INV-2026-042",
    "total_amount": 15000.00,
    "due_date": "2026-04-30"
  },
  "createdAt": "2026-03-31T09:00:00Z",
  "completedAt": "2026-03-31T09:00:08Z"
}

Use this pattern when: your system initiates the request and can poll for results, or when you need synchronous-style integration.

Pattern 2: Webhook-Driven (Event-Based Automation)

The recommended pattern for production automation. Ocriva pushes results to your system as soon as processing completes — no polling required.

Configure webhook in project settings:

{
  "url": "https://your-system.example.com/ocriva-webhook",
  "events": ["document.processed", "batch.completed"],
  "secret": "your-webhook-signing-secret"
}

Webhook payload (document.processed):

{
  "event": "document.processed",
  "timestamp": "2026-03-31T10:15:00Z",
  "organizationId": "org_xyz",
  "projectId": "proj_abc",
  "processingId": "proc_123",
  "templateId": "tmpl_456",
  "status": "completed",
  "result": {
    "fields": {
      "vendor": "Acme Corp",
      "invoice_number": "INV-2026-042",
      "total_amount": 15000.00
    },
    "format": "json"
  }
}

Verify webhook signature (Node.js example):

const crypto = require('crypto');
 
function verifyWebhook(payload, signature, secret) {
  const expected = crypto
    .createHmac('sha256', secret)
    .update(payload)
    .digest('hex');
  return crypto.timingSafeEqual(
    Buffer.from(signature),
    Buffer.from(`sha256=${expected}`)
  );
}

Use this pattern when: you need real-time data delivery and your system can receive HTTP POST requests.

Pattern 3: Batch Processing (High-Volume)

For scenarios where you need to process large volumes of documents and collect consolidated results.

Submit batch:

POST /upload/batch
Authorization: Bearer <api-token>
Content-Type: multipart/form-data
 
files[]: [binary] (up to 50 files)
templateId: tmpl_abc123
projectId: proj_xyz

Receive batch completion webhook:

{
  "event": "batch.completed",
  "batchId": "batch_xyz",
  "totalCount": 50,
  "completedCount": 48,
  "failedCount": 2,
  "exportUrl": "https://storage.ocriva.com/exports/batch_xyz.csv"
}

Use this pattern when: you have high-volume processing needs, can afford slight latency, and want consolidated output.

Pattern 4: LINE Integration (Mobile Capture)

For field-based workflows where documents are captured on mobile devices.

LINE User → sends photo → LINE Official Account

                    Ocriva LINE Bot receives image

                    Routes to configured project/template

                    Processes via AI extraction

                    Returns result to LINE conversation (optional)

                    Fires webhook to downstream system

Use this pattern when: documents are captured in the field (logistics, inspections, deliveries) and staff use LINE.


Security Model

Authentication

JWT (session-based):

  • Used by the web application
  • Tokens issued on login, stored in httpOnly cookies
  • Not accessible to JavaScript (XSS protection)
  • Short expiry with refresh token rotation

API Tokens:

  • Used for service-to-service integration
  • Generated per project in the web interface
  • Long-lived but revocable
  • Include in Authorization: Bearer <token> header
  • Scoped to a single project — cannot access other projects

Data Isolation

  • All database queries are scoped to organizationId
  • Project-level API tokens cannot query resources of other projects within the same organization
  • File storage uses organization-scoped bucket paths
  • No cross-organization data leakage is possible at the query layer

Webhook Security

  • Webhook payloads are signed with HMAC-SHA256
  • Signature included in X-Ocriva-Signature header
  • Verify the signature on receipt to confirm payload authenticity
  • Shared secret configured in project webhook settings

Input Validation

  • All API inputs validated with class-validator (NestJS pipes)
  • File uploads: validated file type and size before storage
  • SQL/NoSQL injection prevention via Mongoose parameterized queries
  • Rate limiting on public endpoints

IMPORTANT

Store API tokens in environment variables only. Never commit tokens to version control or expose them in frontend code. Use httpOnly cookies for session management.

IMPORTANT

Store API tokens as server-side environment variables — never in client-side code or version control. JWT session tokens are stored in httpOnly cookies and are never accessible to JavaScript, protecting against XSS attacks.


Tech Stack

LayerTechnologyNotes
Backend APINestJS 11, TypeScriptModular architecture, Swagger/OpenAPI
DatabaseMongoDB (Mongoose)Document-oriented, flexible schemas
FrontendNext.js 14, React 18, Tailwind CSS 3App Router, Server Components
WebSocketNestJS + Socket.IOReal-time event delivery
AI ProvidersOpenAI, Google Gemini, Anthropic, DeepSeek, Qwen, KimiPer-template provider selection
File StorageSupabase Storage / Google Cloud StorageOrganization-scoped bucket paths
AuthPassport.js (JWT, Google OAuth2), Supabase AuthSession management + social login
PaymentsStripeSubscription and credit purchase
EmailNodemailerTransactional email
SecretsDopplerEnvironment variable management
DeploymentDockerEach service has a Dockerfile

API Reference Summary

The full API is documented at /api/docs (Swagger UI). Key endpoint groups:

GroupBase PathDescription
Auth/authLogin, register, OAuth, token refresh
Organizations/organizationsCRUD for organizations
Projects/projectsCRUD for projects
Templates/templatesTemplate management
Upload/uploadDocument submission (single and batch)
Processing History/processing-historyQuery results and status
Analytics/analyticsUsage statistics
Webhooks/webhooksWebhook configuration
API Tokens/api-tokensToken management
Credits/creditsBalance and usage
LINE/lineLINE integration configuration

All endpoints require authentication. Use Authorization: Bearer <token> with an API token, or authenticate via the session cookie from the web interface.