Architecture & Integration
Technical Overview
This page describes the technical architecture of the Ocriva Document Automation Transformer (DAT) platform — how it is structured, how documents move through it, and how to integrate with it from external systems.
Platform Architecture
Multi-Tenant Hierarchy
Ocriva uses a multi-tenant hierarchy where each Organization is a fully isolated tenant. No data crosses organizational boundaries.
Organization (tenant)
├── Project A
│ ├── Template: Invoice Extractor (JSON output)
│ ├── Template: Receipt Scanner (CSV output)
│ ├── Webhooks: → accounting-api.example.com
│ └── API Tokens: for accounting system integration
├── Project B
│ ├── Template: Contract Analyzer (Text output)
│ ├── Template: NDA Extractor (JSON output)
│ └── API Tokens: for legal document system
├── Project C
│ ├── Template: Employee Onboarding Pack (JSON output)
│ └── Webhooks: → hris.example.com
└── Billing & Credits
├── Stripe subscription
├── Credit balance
└── Usage historyOrganization — the top-level tenant. All billing, credits, and team membership are managed here. One organization per company is the common pattern, but multiple organizations per account are supported (useful for agencies managing multiple clients).
Project — logical groupings within an organization. Use projects to separate document use cases, departments, or client accounts. Access tokens and webhooks are configured per project.
Template — extraction configurations within a project. Each template specifies: which AI model to use, what fields to extract, what format to output. Multiple templates per project allow one project to handle different document types.
Application Services
The Ocriva platform runs as a set of services:
| Service | Technology | Responsibility |
|---|---|---|
| API Server | NestJS (Node.js) | REST API, business logic, AI orchestration |
| Web Frontend | Next.js 14 (React) | User interface, document upload UI |
| WebSocket Server | NestJS + Socket.IO | Real-time event push to clients |
| CMS | Next.js 14 | Documentation and content management |
Processing Pipeline
Every document that enters Ocriva follows this pipeline:
Upload → Queue (pending) → AI Processing (in_progress) → Result (completed/failed)
↓ ↓ ↓
Webhook: Credit deduction Webhook:
document.uploaded (on start) document.processed
↓ ↓
WebSocket: WebSocket:
real-time UI update result push to UIStep-by-Step
-
Upload — Client submits document via web UI, REST API, or LINE integration. Document is stored in Supabase Storage or Google Cloud Storage. A processing record is created with status
pending. -
Queue — The processing record enters the queue. A
document.uploadedwebhook event fires (if configured). A WebSocket event notifies connected clients. -
AI Processing — The queue worker picks up the document. Status changes to
in_progress. Credits are deducted. The document is sent to the configured AI provider along with the template's extraction instructions. -
Result — The AI returns extracted data. The processing record is updated to
completedwith the result payload. Adocument.processedwebhook event fires. A WebSocket event pushes the result to connected clients. -
Failure handling — If the AI request fails, the record is marked
failedwith error details. Adocument.processed(with failed status) webhook fires. Transient failures trigger automatic retry.
NOTE
The processing queue runs on a cron cycle that polls for pending documents approximately every 5 seconds. This means there is a small inherent delay between document upload and the start of AI processing. For time-sensitive workflows, factor this into your latency expectations.
Batch Processing
Batch submissions follow the same pipeline per document, with batch-level coordination:
Batch Submit (N documents)
↓
N individual processing records created
↓
Documents processed in parallel (up to concurrency limit)
↓
WebSocket: per-document status updates streamed live
↓
When all N documents complete: batch.completed webhook fires
↓
Batch export available (combined JSON/CSV)Integration Patterns
Pattern 1: REST API (Pull-Based)
The simplest integration. Your system calls the Ocriva API to submit documents and retrieve results.
Submit a document:
POST /upload
Authorization: Bearer <api-token>
Content-Type: multipart/form-data
file: [binary]
templateId: tmpl_abc123
projectId: proj_xyzCheck processing status:
GET /processing-history/{processingId}
Authorization: Bearer <api-token>Response:
{
"id": "proc_abc123",
"status": "completed",
"result": {
"vendor": "Acme Corp",
"invoice_number": "INV-2026-042",
"total_amount": 15000.00,
"due_date": "2026-04-30"
},
"createdAt": "2026-03-31T09:00:00Z",
"completedAt": "2026-03-31T09:00:08Z"
}Use this pattern when: your system initiates the request and can poll for results, or when you need synchronous-style integration.
Pattern 2: Webhook-Driven (Event-Based Automation)
The recommended pattern for production automation. Ocriva pushes results to your system as soon as processing completes — no polling required.
Configure webhook in project settings:
{
"url": "https://your-system.example.com/ocriva-webhook",
"events": ["document.processed", "batch.completed"],
"secret": "your-webhook-signing-secret"
}Webhook payload (document.processed):
{
"event": "document.processed",
"timestamp": "2026-03-31T10:15:00Z",
"organizationId": "org_xyz",
"projectId": "proj_abc",
"processingId": "proc_123",
"templateId": "tmpl_456",
"status": "completed",
"result": {
"fields": {
"vendor": "Acme Corp",
"invoice_number": "INV-2026-042",
"total_amount": 15000.00
},
"format": "json"
}
}Verify webhook signature (Node.js example):
const crypto = require('crypto');
function verifyWebhook(payload, signature, secret) {
const expected = crypto
.createHmac('sha256', secret)
.update(payload)
.digest('hex');
return crypto.timingSafeEqual(
Buffer.from(signature),
Buffer.from(`sha256=${expected}`)
);
}Use this pattern when: you need real-time data delivery and your system can receive HTTP POST requests.
Pattern 3: Batch Processing (High-Volume)
For scenarios where you need to process large volumes of documents and collect consolidated results.
Submit batch:
POST /upload/batch
Authorization: Bearer <api-token>
Content-Type: multipart/form-data
files[]: [binary] (up to 50 files)
templateId: tmpl_abc123
projectId: proj_xyzReceive batch completion webhook:
{
"event": "batch.completed",
"batchId": "batch_xyz",
"totalCount": 50,
"completedCount": 48,
"failedCount": 2,
"exportUrl": "https://storage.ocriva.com/exports/batch_xyz.csv"
}Use this pattern when: you have high-volume processing needs, can afford slight latency, and want consolidated output.
Pattern 4: LINE Integration (Mobile Capture)
For field-based workflows where documents are captured on mobile devices.
LINE User → sends photo → LINE Official Account
↓
Ocriva LINE Bot receives image
↓
Routes to configured project/template
↓
Processes via AI extraction
↓
Returns result to LINE conversation (optional)
↓
Fires webhook to downstream systemUse this pattern when: documents are captured in the field (logistics, inspections, deliveries) and staff use LINE.
Security Model
Authentication
JWT (session-based):
- Used by the web application
- Tokens issued on login, stored in httpOnly cookies
- Not accessible to JavaScript (XSS protection)
- Short expiry with refresh token rotation
API Tokens:
- Used for service-to-service integration
- Generated per project in the web interface
- Long-lived but revocable
- Include in
Authorization: Bearer <token>header - Scoped to a single project — cannot access other projects
Data Isolation
- All database queries are scoped to
organizationId - Project-level API tokens cannot query resources of other projects within the same organization
- File storage uses organization-scoped bucket paths
- No cross-organization data leakage is possible at the query layer
Webhook Security
- Webhook payloads are signed with HMAC-SHA256
- Signature included in
X-Ocriva-Signatureheader - Verify the signature on receipt to confirm payload authenticity
- Shared secret configured in project webhook settings
Input Validation
- All API inputs validated with class-validator (NestJS pipes)
- File uploads: validated file type and size before storage
- SQL/NoSQL injection prevention via Mongoose parameterized queries
- Rate limiting on public endpoints
IMPORTANT
Store API tokens in environment variables only. Never commit tokens to version control or expose them in frontend code. Use httpOnly cookies for session management.
IMPORTANT
Store API tokens as server-side environment variables — never in client-side code or version control. JWT session tokens are stored in httpOnly cookies and are never accessible to JavaScript, protecting against XSS attacks.
Tech Stack
| Layer | Technology | Notes |
|---|---|---|
| Backend API | NestJS 11, TypeScript | Modular architecture, Swagger/OpenAPI |
| Database | MongoDB (Mongoose) | Document-oriented, flexible schemas |
| Frontend | Next.js 14, React 18, Tailwind CSS 3 | App Router, Server Components |
| WebSocket | NestJS + Socket.IO | Real-time event delivery |
| AI Providers | OpenAI, Google Gemini, Anthropic, DeepSeek, Qwen, Kimi | Per-template provider selection |
| File Storage | Supabase Storage / Google Cloud Storage | Organization-scoped bucket paths |
| Auth | Passport.js (JWT, Google OAuth2), Supabase Auth | Session management + social login |
| Payments | Stripe | Subscription and credit purchase |
| Nodemailer | Transactional email | |
| Secrets | Doppler | Environment variable management |
| Deployment | Docker | Each service has a Dockerfile |
API Reference Summary
The full API is documented at /api/docs (Swagger UI). Key endpoint groups:
| Group | Base Path | Description |
|---|---|---|
| Auth | /auth | Login, register, OAuth, token refresh |
| Organizations | /organizations | CRUD for organizations |
| Projects | /projects | CRUD for projects |
| Templates | /templates | Template management |
| Upload | /upload | Document submission (single and batch) |
| Processing History | /processing-history | Query results and status |
| Analytics | /analytics | Usage statistics |
| Webhooks | /webhooks | Webhook configuration |
| API Tokens | /api-tokens | Token management |
| Credits | /credits | Balance and usage |
| LINE | /line | LINE integration configuration |
All endpoints require authentication. Use Authorization: Bearer <token> with an API token, or authenticate via the session cookie from the web interface.
