Platform
Platform — clouds, LLMs, and OCR
Hosted SaaS: Cloud accounts and LLM keys are provided for you.
Self-hosted: You supply cloud accounts and LLM keys.
Supported clouds
Each cloud is configured under Settings → Account → Development. Configure only the clouds whose services you use.
AWS
Services: S3 (document storage), Textract (OCR), Bedrock (LLMs and embeddings).
Configuration: IAM user access key ID, secret access key, and S3 bucket name.
GCP
Services: Vertex AI (Gemini and embeddings).
Configuration: Google Cloud service account JSON key.
Azure
Services: Azure OpenAI and Microsoft Foundry (Azure AI) LLMs.
Configuration: Microsoft Entra service principal (tenant ID, client ID, client secret) and the Foundry service API base URL.
Deployment
Self-hosted DocRouter installs via a Kubernetes Helm chart or Docker Compose. See Deploying Doc Router on Kubernetes and the open source page.
Supported LLM providers
First-class provider entries in the open-source product include:
- Anthropic (Claude)
- OpenAI (chat and embedding models)
- Gemini (Google AI Studio)
- Google Vertex AI — requires GCP when used
- AWS Bedrock — requires AWS when used
- Azure OpenAI
- Microsoft Foundry — requires Azure when used
- Mistral
- Groq
- OpenRouter
- xAI
The exact default model lists change between releases. For the authoritative catalog, see get_llm_providers() in the DocRouter source (packages/python/analytiq_data/llm/providers.py).
Supported OCR algorithms
Organization admins choose one OCR mode per organization; the pipeline runs that engine on the document PDF and stores a normalized OCR payload for downstream extraction and search.
All the OCR models are enabled in the SAAS version of DocRouter, at https://app.docrouter.ai/. When installed on-prem, here are the requirements to enabled each OCR model:
| Mode | What it does |
|---|---|
textract |
Amazon Textract AnalyzeDocument. Configurable feature types (e.g. LAYOUT, TABLES, FORMS, SIGNATURES). Requires AWS. |
mistral |
Mistral OCR via the Mistral API (model mistral-ocr-latest in product code). Returns Mistral OCR JSON. Requires Mistral provider. |
mistral-vertex |
Mistral OCR via GCP (model mistral-ocr-2505 in product code). Returns Mistral OCR JSON. Requires GCP. |
llm |
Vision LLM OCR — uses a LiteLLM provider and model for per-page markdown. Gemini models are best performing for LLM OCR. |
pymupdf |
PyMuPDF — embedded text from the PDF only (no cloud OCR). No vendor cloud required. |
DocRouter.AI