Platform

Platform — clouds, LLMs, and OCR

Hosted SaaS: Cloud accounts and LLM keys are provided for you.

Self-hosted: You supply cloud accounts and LLM keys.

Supported clouds

Each cloud is configured under Settings → Account → Development. Configure only the clouds whose services you use.

AWS

Services: S3 (document storage), Textract (OCR), Bedrock (LLMs and embeddings).

Configuration: IAM user access key ID, secret access key, and S3 bucket name.

GCP

Services: Vertex AI (Gemini and embeddings).

Configuration: Google Cloud service account JSON key.

Azure

Services: Azure OpenAI and Microsoft Foundry (Azure AI) LLMs.

Configuration: Microsoft Entra service principal (tenant ID, client ID, client secret) and the Foundry service API base URL.


Deployment

Self-hosted DocRouter installs via a Kubernetes Helm chart or Docker Compose. See Deploying Doc Router on Kubernetes and the open source page.


Supported LLM providers

First-class provider entries in the open-source product include:

  • Anthropic (Claude)
  • OpenAI (chat and embedding models)
  • Gemini (Google AI Studio)
  • Google Vertex AI — requires GCP when used
  • AWS Bedrock — requires AWS when used
  • Azure OpenAI
  • Microsoft Foundry — requires Azure when used
  • Mistral
  • Groq
  • OpenRouter
  • xAI

The exact default model lists change between releases. For the authoritative catalog, see get_llm_providers() in the DocRouter source (packages/python/analytiq_data/llm/providers.py).


Supported OCR algorithms

Organization admins choose one OCR mode per organization; the pipeline runs that engine on the document PDF and stores a normalized OCR payload for downstream extraction and search.

Mode What it does
textract Amazon Textract AnalyzeDocument. Self-hosted: configurable feature types (e.g. LAYOUT, TABLES, FORMS, SIGNATURES) and AWS credentials/IAM for Textract and S3 as used by your deployment. Requires AWS.
mistral Mistral OCR via the Mistral API (model mistral-ocr-latest in product code). Returns Mistral OCR JSON (pages and layout-oriented content). Requires Mistral.
llm Vision LLM OCR — uses a LiteLLM provider and model for per-page markdown (or equivalent). Self-hosted: you configure provider, model, and credentials. SaaS: not something you configure in your tenant—processing is fully managed.
pymupdf PyMuPDF — embedded text from the PDF only (no cloud OCR). No vendor cloud required.