Overview
The DocRouter Python SDK provides a simple and powerful way to interact with the DocRouter API. It enables programmatic access to your documents, OCR data, LLM analysis, schemas, prompts, and tags.
This SDK makes it easy to integrate DocRouter's document processing capabilities into your Python applications, allowing you to automate document workflows and extract structured data from your documents.
Installation
You can install the DocRouter SDK directly from GitHub:
pip install git+https://github.com/analytiq-hub/doc-router.git#subdirectory=packages
Alternatively, you can clone the repository and install in development mode:
git clone https://github.com/analytiq-hub/doc-router.git
cd doc-router/packages
pip install -e .
Quick Start
To get started with the DocRouter SDK:
- Get your DocRouter organization ID from the URL, e.g.
https://app.docrouter.ai/orgs/<docrouter_org_id>
- Create an organization API token from your organization settings
- Initialize the client and start making API calls
Basic Usage
from docrouter_sdk import DocRouterClient
# Initialize the client
client = DocRouterClient(
base_url="https://app.docrouter.ai/fastapi", # Replace with your API URL
api_token="your_org_api_token" # Replace with your API token
)
# Example: List documents
organization_id = "your_organization_id" # Replace with your organization ID
documents = client.documents.list(organization_id)
print(f"Found {documents.total_count} documents")
# Example: List tags
tags = client.tags.list(organization_id)
print(f"Found {tags.total_count} tags")
# Example: List available LLM models
models = client.llm.list_models()
print(f"Available LLM models: {[model.name for model in models.models]}")
SDK Modules
Documents API
Manage documents in your workspace
- List documents with optional filtering
- Upload new documents
- Get document details
- Update document properties
- Delete documents
OCR API
Access document OCR data
- Get OCR text from documents
- Get OCR text for specific pages
- Get OCR blocks with position data
- Access document OCR metadata
LLM API
Run and manage LLM analysis
- List available LLM models
- Run LLM analysis on documents
- Get LLM extraction results
- Update and verify extraction results
- Delete LLM results
Schemas API
Manage extraction schemas
- Create new extraction schemas
- List existing schemas
- Get schema details
- Update schemas
- Delete schemas
- Validate data against schemas
Prompts API
Manage extraction prompts
- Create new prompts
- List existing prompts
- Get prompt details
- Update prompts
- Delete prompts
Tags API
Manage document tags
- Create new tags
- List existing tags
- Update tags
- Delete tags
Code Examples
Documents API
# List documents
documents = client.documents.list(organization_id, skip=0, limit=10, tag_ids=["tag1", "tag2"])
# Get a document
document = client.documents.get(organization_id, document_id)
# Update a document
client.documents.update(organization_id, document_id, document_name="New Name", tag_ids=["tag1"])
# Delete a document
client.documents.delete(organization_id, document_id)
# Upload a document
import base64
with open("sample.pdf", "rb") as f:
content = base64.b64encode(f.read()).decode("utf-8")
result = client.documents.upload(organization_id, [{
"name": "sample.pdf",
"content": content,
"tag_ids": []
}])
OCR API
# Get OCR blocks
blocks = client.ocr.get_blocks(organization_id, document_id)
# Get OCR text
text = client.ocr.get_text(organization_id, document_id, page_num=1)
# Get OCR metadata
metadata = client.ocr.get_metadata(organization_id, document_id)
print(f"Number of pages: {metadata.n_pages}")
LLM API
# List LLM models
models = client.llm.list_models()
# Run LLM analysis
result = client.llm.run(organization_id, document_id, prompt_id="default", force=False)
# Get LLM result
llm_result = client.llm.get_result(organization_id, document_id, prompt_id="default")
# Update LLM result
updated_result = client.llm.update_result(
organization_id,
document_id,
updated_llm_result={"key": "value"},
prompt_id="default",
is_verified=True
)
# Delete LLM result
client.llm.delete_result(organization_id, document_id, prompt_id="default")
Schemas API
# Create a schema
schema_config = {
"name": "Invoice Schema",
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "invoice_extraction",
"schema": {
"type": "object",
"properties": {
"invoice_date": {
"type": "string",
"description": "invoice date"
}
},
"required": ["invoice_date"],
"additionalProperties": False
},
"strict": True
}
}
}
new_schema = client.schemas.create(organization_id, schema_config)
# List schemas
schemas = client.schemas.list(organization_id)
# Get a schema
schema = client.schemas.get(organization_id, schema_id)
# Validate data against a schema
validation_result = client.schemas.validate(organization_id, schema_id, {"invoice_date": "2023-01-01"})
Prompts API
# Create a prompt
prompt_config = {
"name": "Invoice Extractor",
"content": "Extract the following fields from the invoice...",
"schema_id": "schema_id_here",
"schema_version": 1,
"tag_ids": ["tag1", "tag2"],
"model": "gpt-4o-mini"
}
new_prompt = client.prompts.create(organization_id, prompt_config)
# List prompts
prompts = client.prompts.list(organization_id, document_id="doc_id", tag_ids=["tag1"])
# Get a prompt
prompt = client.prompts.get(organization_id, prompt_id)
Tags API
# Create a tag
tag_config = {
"name": "Invoices",
"color": "#FF5733",
"description": "All invoice documents"
}
new_tag = client.tags.create(organization_id, tag_config)
# List tags
tags = client.tags.list(organization_id)
# Update a tag
updated_tag = client.tags.update(organization_id, tag_id, tag_config)
Error Handling
The SDK provides detailed error messages when API calls fail:
try:
result = client.documents.get(organization_id, "invalid_id")
except Exception as e:
print(f"API Error: {str(e)}")
SDK Structure
The DocRouter SDK is organized into several modules:
docrouter_sdk/
├── __init__.py # Package initialization
├── api/ # API client modules
│ ├── __init__.py
│ ├── client.py # Main client class
│ ├── documents.py # Documents API
│ ├── llm.py # LLM API
│ ├── ocr.py # OCR API
│ ├── prompts.py # Prompts API
│ ├── schemas.py # Schemas API
│ └── tags.py # Tags API
├── models/ # Data models
│ ├── __init__.py
│ ├── document.py # Document models
│ ├── llm.py # LLM models
│ ├── ocr.py # OCR models
│ ├── prompt.py # Prompt models
│ ├── schema.py # Schema models
│ └── tag.py # Tag models
└── examples/ # Usage examples
├── README.md
└── basic_docrouter_client.py
GitHub Repository
The DocRouter Python SDK is part of the docrouter.ai open source project. You can find the source code on GitHub.
View on GitHub