Architecture & Integration

Technical Overview

This page describes the technical architecture of the Ocriva Document Automation Transformer (DAT) platform — how it is structured, how documents move through it, and how to integrate with it from external systems.

Platform Architecture

Multi-Tenant Hierarchy

Ocriva uses a multi-tenant hierarchy where each Organization is a fully isolated tenant. No data crosses organizational boundaries.

Organization (tenant)
├── Project A
│   ├── Template: Invoice Extractor (JSON output)
│   ├── Template: Receipt Scanner (CSV output)
│   ├── Webhooks: → accounting-api.example.com
│   └── API Tokens: for accounting system integration
├── Project B
│   ├── Template: Contract Analyzer (Text output)
│   ├── Template: NDA Extractor (JSON output)
│   └── API Tokens: for legal document system
├── Project C
│   ├── Template: Employee Onboarding Pack (JSON output)
│   └── Webhooks: → hris.example.com
└── Billing & Credits
    ├── Stripe subscription
    ├── Credit balance
    └── Usage history

Organization — the top-level tenant. All billing, credits, and team membership are managed here. One organization per company is the common pattern, but multiple organizations per account are supported (useful for agencies managing multiple clients).

Project — logical groupings within an organization. Use projects to separate document use cases, departments, or client accounts. Access tokens and webhooks are configured per project.

Template — extraction configurations within a project. Each template specifies: which AI model to use, what fields to extract, what format to output. Multiple templates per project allow one project to handle different document types.

Application Services

The Ocriva platform runs as a set of services:

Service	Technology	Responsibility
API Server	NestJS (Node.js)	REST API, business logic, AI orchestration
Web Frontend	Next.js 14 (React)	User interface, document upload UI
WebSocket Server	NestJS + Socket.IO	Real-time event push to clients
CMS	Next.js 14	Documentation and content management

Processing Pipeline

Every document that enters Ocriva follows this pipeline:

Upload → Queue (pending) → AI Processing (in_progress) → Result (completed/failed)
   ↓                              ↓                              ↓
Webhook:                    Credit deduction              Webhook:
document.uploaded           (on start)                   document.processed
   ↓                                                           ↓
WebSocket:                                            WebSocket:
real-time UI update                                   result push to UI

Step-by-Step

Upload — Client submits document via web UI, REST API, or LINE integration. Document is stored in Supabase Storage or Google Cloud Storage. A processing record is created with status pending.
Queue — The processing record enters the queue. A document.uploaded webhook event fires (if configured). A WebSocket event notifies connected clients.
AI Processing — The queue worker picks up the document. Status changes to in_progress. Credits are deducted. The document is sent to the configured AI provider along with the template's extraction instructions.
Result — The AI returns extracted data. The processing record is updated to completed with the result payload. A document.processed webhook event fires. A WebSocket event pushes the result to connected clients.
Failure handling — If the AI request fails, the record is marked failed with error details. A document.processed (with failed status) webhook fires. Transient failures trigger automatic retry.

NOTE

The processing queue runs on a cron cycle that polls for pending documents approximately every 5 seconds. This means there is a small inherent delay between document upload and the start of AI processing. For time-sensitive workflows, factor this into your latency expectations.

Batch Processing

Batch submissions follow the same pipeline per document, with batch-level coordination:

Batch Submit (N documents)
        ↓
N individual processing records created
        ↓
Documents processed in parallel (up to concurrency limit)
        ↓
WebSocket: per-document status updates streamed live
        ↓
When all N documents complete: batch.completed webhook fires
        ↓
Batch export available (combined JSON/CSV)

Integration Patterns

Pattern 1: REST API (Pull-Based)

The simplest integration. Your system calls the Ocriva API to submit documents and retrieve results.

Submit a document:

POST /upload
Authorization: Bearer <api-token>
Content-Type: multipart/form-data
 
file: [binary]
templateId: tmpl_abc123
projectId: proj_xyz

Check processing status:

GET /processing-history/{processingId}
Authorization: Bearer <api-token>

Response:

{
  "id": "proc_abc123",
  "status": "completed",
  "result": {
    "vendor": "Acme Corp",
    "invoice_number": "INV-2026-042",
    "total_amount": 15000.00,
    "due_date": "2026-04-30"
  },
  "createdAt": "2026-03-31T09:00:00Z",
  "completedAt": "2026-03-31T09:00:08Z"
}

Use this pattern when: your system initiates the request and can poll for results, or when you need synchronous-style integration.

Pattern 2: Webhook-Driven (Event-Based Automation)

The recommended pattern for production automation. Ocriva pushes results to your system as soon as processing completes — no polling required.

Configure webhook in project settings:

{
  "url": "https://your-system.example.com/ocriva-webhook",
  "events": ["document.processed", "batch.completed"],
  "secret": "your-webhook-signing-secret"
}

Webhook payload (document.processed):

{
  "event": "document.processed",
  "timestamp": "2026-03-31T10:15:00Z",
  "organizationId": "org_xyz",
  "projectId": "proj_abc",
  "processingId": "proc_123",
  "templateId": "tmpl_456",
  "status": "completed",
  "result": {
    "fields": {
      "vendor": "Acme Corp",
      "invoice_number": "INV-2026-042",
      "total_amount": 15000.00
    },
    "format": "json"
  }
}

Verify webhook signature (Node.js example):

const crypto = require('crypto');
 
function verifyWebhook(payload, signature, secret) {
  const expected = crypto
    .createHmac('sha256', secret)
    .update(payload)
    .digest('hex');
  return crypto.timingSafeEqual(
    Buffer.from(signature),
    Buffer.from(`sha256=${expected}`)
  );
}

Use this pattern when: you need real-time data delivery and your system can receive HTTP POST requests.

Pattern 3: Batch Processing (High-Volume)

For scenarios where you need to process large volumes of documents and collect consolidated results.

Submit batch:

POST /upload/batch
Authorization: Bearer <api-token>
Content-Type: multipart/form-data
 
files[]: [binary] (up to 50 files)
templateId: tmpl_abc123
projectId: proj_xyz

Receive batch completion webhook:

{
  "event": "batch.completed",
  "batchId": "batch_xyz",
  "totalCount": 50,
  "completedCount": 48,
  "failedCount": 2,
  "exportUrl": "https://storage.ocriva.com/exports/batch_xyz.csv"
}

Use this pattern when: you have high-volume processing needs, can afford slight latency, and want consolidated output.

Pattern 4: LINE Integration (Mobile Capture)

For field-based workflows where documents are captured on mobile devices.

LINE User → sends photo → LINE Official Account
                                ↓
                    Ocriva LINE Bot receives image
                                ↓
                    Routes to configured project/template
                                ↓
                    Processes via AI extraction
                                ↓
                    Returns result to LINE conversation (optional)
                                ↓
                    Fires webhook to downstream system

Use this pattern when: documents are captured in the field (logistics, inspections, deliveries) and staff use LINE.

Security Model

Authentication

JWT (session-based):

Used by the web application
Tokens issued on login, stored in httpOnly cookies
Not accessible to JavaScript (XSS protection)
Short expiry with refresh token rotation

API Tokens:

Used for service-to-service integration
Generated per project in the web interface
Long-lived but revocable
Include in Authorization: Bearer <token> header
Scoped to a single project — cannot access other projects

Data Isolation

All database queries are scoped to organizationId
Project-level API tokens cannot query resources of other projects within the same organization
File storage uses organization-scoped bucket paths
No cross-organization data leakage is possible at the query layer

Webhook Security

Webhook payloads are signed with HMAC-SHA256
Signature included in X-Ocriva-Signature header
Verify the signature on receipt to confirm payload authenticity
Shared secret configured in project webhook settings

Input Validation

All API inputs validated with class-validator (NestJS pipes)
File uploads: validated file type and size before storage
SQL/NoSQL injection prevention via Mongoose parameterized queries
Rate limiting on public endpoints

IMPORTANT

Store API tokens in environment variables only. Never commit tokens to version control or expose them in frontend code. Use httpOnly cookies for session management.

IMPORTANT

Store API tokens as server-side environment variables — never in client-side code or version control. JWT session tokens are stored in httpOnly cookies and are never accessible to JavaScript, protecting against XSS attacks.

Tech Stack

Layer	Technology	Notes
Backend API	NestJS 11, TypeScript	Modular architecture, Swagger/OpenAPI
Database	MongoDB (Mongoose)	Document-oriented, flexible schemas
Frontend	Next.js 14, React 18, Tailwind CSS 3	App Router, Server Components
WebSocket	NestJS + Socket.IO	Real-time event delivery
AI Providers	OpenAI, Google Gemini, Anthropic, DeepSeek, Qwen, Kimi	Per-template provider selection
File Storage	Supabase Storage / Google Cloud Storage	Organization-scoped bucket paths
Auth	Passport.js (JWT, Google OAuth2), Supabase Auth	Session management + social login
Payments	Stripe	Subscription and credit purchase
Email	Nodemailer	Transactional email
Secrets	Doppler	Environment variable management
Deployment	Docker	Each service has a Dockerfile

API Reference Summary

The full API is documented at /api/docs (Swagger UI). Key endpoint groups:

Group	Base Path	Description
Auth	`/auth`	Login, register, OAuth, token refresh
Organizations	`/organizations`	CRUD for organizations
Projects	`/projects`	CRUD for projects
Templates	`/templates`	Template management
Upload	`/upload`	Document submission (single and batch)
Processing History	`/processing-history`	Query results and status
Analytics	`/analytics`	Usage statistics
Webhooks	`/webhooks`	Webhook configuration
API Tokens	`/api-tokens`	Token management
Credits	`/credits`	Balance and usage
LINE	`/line`	LINE integration configuration

All endpoints require authentication. Use Authorization: Bearer <token> with an API token, or authenticate via the session cookie from the web interface.

Documents

Architecture & Integration

Architecture & Integration

Technical Overview

Platform Architecture

Multi-Tenant Hierarchy

Application Services

Processing Pipeline

Step-by-Step

Batch Processing

Integration Patterns

Pattern 1: REST API (Pull-Based)

Pattern 2: Webhook-Driven (Event-Based Automation)

Pattern 3: Batch Processing (High-Volume)

Pattern 4: LINE Integration (Mobile Capture)

Security Model

Authentication

Data Isolation

Webhook Security

Input Validation

Tech Stack

API Reference Summary

Table of Contents