Platform
Platform — clouds, LLMs, and OCR
Hosted SaaS: Cloud accounts and LLM keys are provided for you.
Self-hosted: You supply cloud accounts and LLM keys.
Supported clouds
Each cloud is configured under Settings → Account → Development. Configure only the clouds whose services you use.
AWS
Services: S3 (document storage), Textract (OCR), Bedrock (LLMs and embeddings).
Configuration: IAM user access key ID, secret access key, and S3 bucket name.
GCP
Services: Vertex AI (Gemini and embeddings).
Configuration: Google Cloud service account JSON key.
Azure
Services: Azure OpenAI and Microsoft Foundry (Azure AI) LLMs.
Configuration: Microsoft Entra service principal (tenant ID, client ID, client secret) and the Foundry service API base URL.
Deployment
Self-hosted DocRouter installs via a Kubernetes Helm chart or Docker Compose. See Deploying Doc Router on Kubernetes and the open source page.
Supported LLM providers
First-class provider entries in the open-source product include:
- Anthropic (Claude)
- OpenAI (chat and embedding models)
- Gemini (Google AI Studio)
- Google Vertex AI — requires GCP when used
- AWS Bedrock — requires AWS when used
- Azure OpenAI
- Microsoft Foundry — requires Azure when used
- Mistral
- Groq
- OpenRouter
- xAI
The exact default model lists change between releases. For the authoritative catalog, see get_llm_providers() in the DocRouter source (packages/python/analytiq_data/llm/providers.py).
Supported OCR algorithms
Organization admins choose one OCR mode per organization; the pipeline runs that engine on the document PDF and stores a normalized OCR payload for downstream extraction and search.
| Mode | What it does |
|---|---|
textract |
Amazon Textract AnalyzeDocument. Self-hosted: configurable feature types (e.g. LAYOUT, TABLES, FORMS, SIGNATURES) and AWS credentials/IAM for Textract and S3 as used by your deployment. Requires AWS. |
mistral |
Mistral OCR via the Mistral API (model mistral-ocr-latest in product code). Returns Mistral OCR JSON (pages and layout-oriented content). Requires Mistral. |
llm |
Vision LLM OCR — uses a LiteLLM provider and model for per-page markdown (or equivalent). Self-hosted: you configure provider, model, and credentials. SaaS: not something you configure in your tenant—processing is fully managed. |
pymupdf |
PyMuPDF — embedded text from the PDF only (no cloud OCR). No vendor cloud required. |
DocRouter.AI