Platform

Platform — clouds, LLMs, and OCR

Hosted SaaS: Cloud accounts and LLM keys are provided for you.

Self-hosted: You supply cloud accounts and LLM keys.

Supported clouds

Each cloud is configured under Settings → Account → Development. Configure only the clouds whose services you use.

AWS

Services: S3 (document storage), Textract (OCR), Bedrock (LLMs and embeddings).

Configuration: IAM user access key ID, secret access key, and S3 bucket name.

GCP

Services: Vertex AI (Gemini and embeddings).

Configuration: Google Cloud service account JSON key.

Azure

Services: Azure OpenAI and Microsoft Foundry (Azure AI) LLMs.

Configuration: Microsoft Entra service principal (tenant ID, client ID, client secret) and the Foundry service API base URL.

Deployment

Self-hosted DocRouter installs via a Kubernetes Helm chart or Docker Compose. See Deploying Doc Router on Kubernetes and the open source page.

Supported LLM providers

First-class provider entries in the open-source product include:

Anthropic (Claude)
OpenAI (chat and embedding models)
Gemini (Google AI Studio)
Google Vertex AI — requires GCP when used
AWS Bedrock — requires AWS when used
Azure OpenAI
Microsoft Foundry — requires Azure when used
Mistral
Groq
OpenRouter
xAI

The exact default model lists change between releases. For the authoritative catalog, see get_llm_providers() in the DocRouter source (packages/python/analytiq_data/llm/providers.py).

Supported OCR algorithms

Organization admins choose one OCR mode per organization; the pipeline runs that engine on the document PDF and stores a normalized OCR payload for downstream extraction and search.

All the OCR models are enabled in the SAAS version of DocRouter, at https://app.docrouter.ai/. When installed on-prem, here are the requirements to enabled each OCR model:

Mode	What it does
`textract`	Amazon Textract `AnalyzeDocument`. Configurable feature types (e.g. `LAYOUT`, `TABLES`, `FORMS`, `SIGNATURES`). Requires AWS.
`mistral`	Mistral OCR via the Mistral API (model `mistral-ocr-latest` in product code). Returns Mistral OCR JSON. Requires Mistral provider.
`mistral-vertex`	Mistral OCR via GCP (model `mistral-ocr-2505` in product code). Returns Mistral OCR JSON. Requires GCP.
`llm`	Vision LLM OCR — uses a LiteLLM provider and model for per-page markdown. Gemini models are best performing for LLM OCR.
`pymupdf`	PyMuPDF — embedded text from the PDF only (no cloud OCR). No vendor cloud required.