On-Prem Installation
Deploy DocRouter on your servers (Docker Compose or Kubernetes) while connecting to managed cloud APIs for OCR, storage, email, and LLM inference. For a quick install, start with the Open Source page; this guide covers AWS account setup, IAM, Terraform, LLM configuration, and Vertex AI Gemini in detail.
Overview
DocRouter is a multi-service application that runs on your servers but calls out to managed cloud APIs for document OCR, object storage, email, and LLM inference.
Figure 1: On-prem installation architecture — customer infrastructure, AWS account, and optional third-party LLM APIs.
Typical on-prem stack
| Component | Where it runs | Notes |
|---|---|---|
| Frontend (Next.js) | Docker Compose / K8s | Port 3000 |
| Backend (FastAPI) | Docker Compose / K8s | Port 8000 |
| Workers | Docker Compose / K8s | OCR, LLMs, and flows (queue-based processing) |
| MongoDB | Embedded, Atlas, or DocumentDB | App state, encrypted credentials, login passwords |
| AWS | Customer AWS account | S3, Textract, SES (optional), Bedrock (optional) |
| Third-party LLM APIs | Vendor SaaS (optional) | OpenAI, Anthropic, Vertex AI & Gemini, Azure, etc. |
Credentials are stored encrypted in MongoDB (cloud_config for deployment-wide AWS/GCP/Azure; llm_providers for per-provider API keys; user login passwords). On first startup, values from .env are seeded into the database when admin bootstrap completes.
Step 1 — Create an AWS account
- Go to https://aws.amazon.com and create an account (or use an existing organizational account).
- Enable MFA on the root user; do not use root credentials for DocRouter.
- Create an IAM admin user (or use AWS IAM Identity Center) for console and Terraform work.
- Choose a primary region. DocRouter defaults to us-east-1 for AWS clients (S3, Textract, SES, Bedrock). Keep S3, Textract, and SES in the same region when possible.
- (Recommended) Enable AWS CloudTrail in the account for audit and troubleshooting.
Step 2 — Provision AWS resources
You can provision infrastructure manually (console) or with the reference Terraform in the analytiq-terraform sandbox (applications/docrouter), which wraps modules/docrouter.
Option A — Terraform (recommended)
Clone the sandbox and apply from the DocRouter application module:
git clone https://github.com/analytiq-hub/analytiq-terraform.git
cd analytiq-terraform/applications/docrouter
terraform init
terraform apply -var='prefix=docrouter' -var='bucket_name=your-company-docrouter-data'
Capture outputs:
| Output | Use in DocRouter |
|---|---|
app_access_key |
AWS_ACCESS_KEY_ID |
app_secret_key |
AWS_SECRET_ACCESS_KEY |
bucket_name |
AWS_S3_BUCKET_NAME |
app_role_arn |
Verify role exists (see naming below) |
Option B — Manual console setup
Follow the IAM layout below so it matches what the application expects.
Step 3 — IAM roles, users, and permissions
The reference Terraform module modules/docrouter in analytiq-terraform creates a user + role pattern. DocRouter authenticates with the IAM user access keys, then assumes an IAM role for S3 and Textract. Bedrock is invoked with the user credentials directly (not the assumed role).
Naming convention (required)
The backend derives the role ARN from the IAM user name:
- IAM user:
{prefix}-app-user(e.g.docrouter-app-user) - IAM role:
{prefix}-app-role(e.g.docrouter-app-role)
If the user name does not follow {prefix}-{suffix}-user, role assumption fails and Textract/S3 will not work. Use the Terraform module or mirror its names exactly.
Resources created by Terraform
| Resource | Name pattern | Purpose |
|---|---|---|
| IAM user | {prefix}-app-user |
Long-lived access keys in .env / admin UI |
| IAM role | {prefix}-app-role |
Assumed for S3 and Textract |
| S3 bucket | bucket_name variable |
Document storage and Textract staging |
| Access key | (attached to user) | Programmatic access |
Permissions on the IAM role ({prefix}-app-role)
| Permission source | AWS service | What DocRouter uses it for |
|---|---|---|
Managed policy AmazonTextractFullAccess |
Amazon Textract | OCR on uploaded PDFs and images |
Managed policy AmazonSESFullAccess |
Amazon SES | Invitation, verification, and notification email |
Inline policy {prefix}-app-role-s3-access |
Amazon S3 | GetObject, PutObject, ListBucket, DeleteObject on the DocRouter bucket only |
S3 bucket settings from Terraform:
- Server-side encryption: AES-256
- Bucket policy grants the app role
s3:*on bucket objects
Permissions on the IAM user ({prefix}-app-user)
| Policy | Actions | Why on the user (not the role) |
|---|---|---|
Inline assume-{prefix}-app-role |
sts:AssumeRole on {prefix}-app-role |
Allows the app to assume the role for S3/Textract |
Inline {prefix}-bedrock-user-policy |
bedrock:InvokeModel on foundation models and inference profiles |
Bedrock auth uses user keys, not assumed-role credentials |
Bedrock policy (from Terraform):
{
"Effect": "Allow",
"Action": "bedrock:InvokeModel",
"Resource": [
"arn:aws:bedrock:*::foundation-model/*",
"arn:aws:bedrock:*:*:inference-profile/*"
]
}
Minimal manual IAM policy (if not using managed policies)
For least-privilege manual setup, attach equivalent permissions to the role:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:ListBucket",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::YOUR-BUCKET-NAME",
"arn:aws:s3:::YOUR-BUCKET-NAME/*"
]
},
{
"Effect": "Allow",
"Action": [
"textract:StartDocumentAnalysis",
"textract:GetDocumentAnalysis",
"textract:StartDocumentTextDetection",
"textract:GetDocumentTextDetection"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"ses:SendEmail",
"ses:SendRawEmail"
],
"Resource": "*"
}
]
}
On the user, add sts:AssumeRole for the role ARN and bedrock:InvokeModel if you use Bedrock.
Step 4 — AWS services used by DocRouter
| Service | Required? | Used for |
|---|---|---|
| Amazon S3 | Yes (for default OCR pipeline) | Document binaries, Textract temp uploads, exports |
| Amazon Textract | Yes (unless org OCR mode is changed) | Default OCR engine (textract mode) |
| Amazon SES | Optional | Outbound email when SES_FROM_EMAIL is set; sender must be verified in SES |
| Amazon Bedrock | Optional | Claude and embedding models via bedrock LLM provider |
| AWS STS | Yes (with role pattern) | AssumeRole from user to role |
DocRouter does not require AWS Lambda, ECS, or EKS for on-prem installs; only the APIs above.
Bedrock model access
After IAM is in place, open the Amazon Bedrock console → Model access (or Foundation models) and enable the models you plan to use, for example:
us.anthropic.claude-sonnet-4-6us.anthropic.claude-opus-4-6-v1cohere.embed-v4:0amazon.titan-embed-text-v2:0
Then enable the Bedrock provider in DocRouter (see LLM section below).
SES setup (optional)
- Verify the sender address or domain in SES.
- If the account is in the SES sandbox, verify recipient addresses or request production access.
- Set
SES_FROM_EMAILin.envto the verified sender.
Step 5 — Configure AWS in DocRouter
Via .env (first boot)
Add to the project root .env (see .env.example.aws_lightsail in the doc-router repository):
AWS_ACCESS_KEY_ID=AKIA...
AWS_SECRET_ACCESS_KEY=...
AWS_S3_BUCKET_NAME=your-company-docrouter-data
SES_FROM_EMAIL=noreply@yourdomain.com # optional
On startup, after the admin user exists, these values are encrypted and stored in cloud_config (type: "aws").
Via admin UI (ongoing)
Account → Development → AWS setup (/settings/account/development/aws-config)
Stores access key ID, secret access key, and S3 bucket name in cloud_config. This overrides .env for runtime behavior.
Verify
Check backend logs after restart:
- Success: AWS clients initialize and Textract runs on document upload
- Failure:
AWS credentials are not correctorAWS role assumption failed— usually wrong keys, missingAssumeRole, or user/role naming mismatch
Step 6 — LLM providers and API keys
DocRouter routes all LLM calls through LiteLLM. Providers are defined in code (packages/python/analytiq_data/llm/providers.py), seeded into MongoDB on startup, and configured per deployment. See the Platform page for a summary of supported clouds and providers.
Supported providers (summary)
| Provider | Auth | Env variable | Default enabled |
|---|---|---|---|
| OpenAI | API key | OPENAI_API_KEY |
Yes |
| Anthropic | API key | ANTHROPIC_API_KEY |
Yes |
| Gemini (Google AI API) | API key | GEMINI_API_KEY |
Yes |
| Mistral | API key | MISTRAL_API_KEY |
Yes |
| Groq | API key | GROQ_API_KEY |
Yes |
| xAI | API key | XAI_API_KEY |
Yes |
| OpenRouter | API key | OPENROUTER_API_KEY |
Yes |
| Azure OpenAI | API key | AZURE_OPENAI_API_KEY |
No |
| Microsoft Foundry | Service principal | AZURE_TENANT_ID, AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, AZURE_API_BASE |
No |
| AWS Bedrock | AWS user keys | (same as AWS above) | No |
| Google Vertex AI | GCP service account JSON | GCP setup UI | No |
How to add API keys
Option 1 — .env at install time
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
MISTRAL_API_KEY=...
GEMINI_API_KEY=...
GROQ_API_KEY=...
On first startup, setup_llm_providers copies non-empty env values into encrypted llm_providers.token fields.
Option 2 — Admin UI (preferred after install)
- Sign in as admin.
- Go to Account → Development → LLM Manager (
/settings/account/development/llm-manager). - Open a provider (e.g. OpenAI, Mistral).
- Paste the API key, enable the provider, and select which models are enabled.
- Use Test on a model to confirm connectivity.
API: PUT /v0/account/llm/provider/{provider_name} with { "token": "...", "enabled": true, ... }.
Bedrock — no separate API key; enable the provider and ensure AWS user has bedrock:InvokeModel and models are enabled in the Bedrock console.
Vertex AI — do not set a token on the LLM provider; use GCP setup (next section).
Recommended models for document extraction
For speed and cost on production document workflows:
| Use case | Recommended model | Provider |
|---|---|---|
| Document processing | gemini/gemini-3-flash-preview |
Vertex AI or Gemini API |
| Document Agent | claude-opus-4-6 |
AWS Bedrock or Anthropic |
| Chat With Knowledge Base | openai-5.2 | OpenAI |
See knowledge_base/prompts.md in the doc-router repository for model-selection guidance used by built-in agents.
Step 7 — Enable Vertex AI Gemini
Vertex AI is the recommended path when you want Gemini in a GCP project (enterprise billing, VPC-SC, data residency) rather than the consumer Gemini API (GEMINI_API_KEY).
Vertex also unlocks Mistral Vertex OCR (mistral_vertex OCR mode), which uses the same GCP service account.
7.1 Create a GCP project
- Open Google Cloud Console.
- Create a project (note the Project ID).
- Enable billing on the project.
7.2 Enable APIs
In APIs & Services → Enable APIs, enable:
- Vertex AI API (
aiplatform.googleapis.com)
For Mistral Vertex OCR only, no extra publisher API is required beyond Vertex AI and Cloud Platform scope.
7.3 Create a service account
- IAM & Admin → Service Accounts → Create.
- Name example:
docrouter-vertex. - Grant roles (minimum practical set):
- Vertex AI User (
roles/aiplatform.user)
- Vertex AI User (
- Create a JSON key and download it securely.
7.4 Configure DocRouter
Option A — Admin UI
- Account → Development → GCP setup (
/settings/account/development/gcp-config). - Paste the full service account JSON.
- Save.
Option B — Environment (legacy / automation)
Some deployments set GOOGLE_APPLICATION_CREDENTIALS to a JSON file path on the server. The primary supported path for on-prem is the GCP setup UI, which stores encrypted JSON in cloud_config (type: "gcp").
Optional — region
Set in .env:
VERTEX_AI_LOCATION=global
Default is global. LiteLLM uses this as vertex_location with vertex_project from the service account project_id. For Mistral Vertex OCR, the Mistral endpoint is fixed to us-central1 in code.
7.5 Enable Vertex models in LLM Manager
- Account → Development → LLM Manager → Google Vertex AI.
- Enable the provider.
- Enable models, for example:
vertex_ai/gemini-2.5-flash— strong quality/latency balance for extractionvertex_ai/gemini-3.1-flash-lite-preview— newer flash tiervertex_ai/gemini-embedding-001— embeddings for knowledge bases
- Run Test on
vertex_ai/gemini-2.5-flash.
Gemini API vs Vertex AI
Gemini API (GEMINI_API_KEY) |
Vertex AI (GCP service account) | |
|---|---|---|
| Setup | API key from Google AI Studio | GCP project + service account JSON |
| Model prefix | gemini/gemini-3-flash-preview |
vertex_ai/gemini-2.5-flash |
| Best for | Quick start, dev | Enterprise GCP, Mistral Vertex OCR |
| DocRouter UI | LLM Manager → Gemini | GCP setup + LLM Manager → Vertex AI |
For the best quality/performance combo on Vertex, start with vertex_ai/gemini-2.5-flash; move to vertex_ai/gemini-3.1-flash-lite-preview when you want the newest flash generation.
Step 8 — End-to-end checklist after AWS account creation
Use this ordered checklist for a typical on-prem deployment.
AWS
- Create S3 bucket (encrypted, private, correct region)
- Create IAM role
{prefix}-app-rolewith Textract, SES, and S3 bucket access - Create IAM user
{prefix}-app-userwith access key - Grant user
sts:AssumeRoleon the role - Grant user
bedrock:InvokeModelif using Bedrock - (Optional) Verify SES sender and set
SES_FROM_EMAIL - (Optional) Enable Bedrock foundation models in the console
- Put
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY,AWS_S3_BUCKET_NAMEin.envor AWS setup UI
Application platform
- Provision server or containers (see Open Source for Docker Compose and Helm)
- Install MongoDB and set
MONGODB_URI - Set
ADMIN_EMAIL,ADMIN_PASSWORD,NEXTAUTH_SECRET,NEXTAUTH_URL - Deploy frontend, backend, worker, and reverse proxy (nginx)
- Confirm migrations ran and admin can log in
LLM
- Add API keys via
.envor LLM Manager (OpenAI, Mistral, etc.) - Enable chosen models per organization prompts and flows
- (Optional) Enable Bedrock provider after AWS + model access
- (Optional) Complete GCP Vertex setup and enable
vertex_ai/gemini-2.5-flash
Validation
- Upload a PDF; confirm Textract OCR completes
- Run a prompt against your chosen flash model
- (Optional) Send a test email if using SES
- (Optional) Test Bedrock or Vertex from LLM Manager
Credential storage reference
| Credential | Storage | Admin UI |
|---|---|---|
| AWS keys + bucket | cloud_config type: "aws" |
Account → Development → AWS setup |
| GCP service account | cloud_config type: "gcp" |
Account → Development → GCP setup |
| Azure Foundry SP | cloud_config type: "azure" |
Account → Development → Azure setup |
| OpenAI, Mistral, etc. | llm_providers.token (encrypted) |
Account → Development → LLM Manager |
Related Terraform sandbox
The canonical IAM and S3 layout lives in the analytiq-terraform GitHub sandbox:
- applications/docrouter/main.tf — application entrypoint
- modules/docrouter/main.tf — IAM user, role, policies, S3 bucket
When in doubt, clone the sandbox, run terraform apply from applications/docrouter, and use the outputs in DocRouter .env.
DocRouter.AI