DocRouter Python SDK

Python client library for interacting with docrouter.ai

Overview

The DocRouter Python SDK provides a simple and powerful way to interact with the DocRouter API. It enables programmatic access to your documents, OCR data, LLM analysis, schemas, prompts, and tags.

This SDK makes it easy to integrate DocRouter's document processing capabilities into your Python applications, allowing you to automate document workflows and extract structured data from your documents.

Installation

You can install the DocRouter SDK directly from GitHub:

pip install git+https://github.com/analytiq-hub/doc-router.git#subdirectory=packages

Alternatively, you can clone the repository and install in development mode:

git clone https://github.com/analytiq-hub/doc-router.git
cd doc-router/packages
pip install -e .

Quick Start

To get started with the DocRouter SDK:

  1. Get your DocRouter organization ID from the URL, e.g. https://app.docrouter.ai/orgs/<docrouter_org_id>
  2. Create an organization API token from your organization settings
  3. Initialize the client and start making API calls

Basic Usage

from docrouter_sdk import DocRouterClient

# Initialize the client
client = DocRouterClient(
    base_url="https://app.docrouter.ai/fastapi",  # Replace with your API URL
    api_token="your_org_api_token"               # Replace with your API token
)

# Example: List documents
organization_id = "your_organization_id"  # Replace with your organization ID
documents = client.documents.list(organization_id)
print(f"Found {documents.total_count} documents")

# Example: List tags
tags = client.tags.list(organization_id)
print(f"Found {tags.total_count} tags")

# Example: List available LLM models
models = client.llm.list_models()
print(f"Available LLM models: {[model.name for model in models.models]}")

SDK Modules

Documents API

Manage documents in your workspace

  • List documents with optional filtering
  • Upload new documents
  • Get document details
  • Update document properties
  • Delete documents

OCR API

Access document OCR data

  • Get OCR text from documents
  • Get OCR text for specific pages
  • Get OCR blocks with position data
  • Access document OCR metadata

LLM API

Run and manage LLM analysis

  • List available LLM models
  • Run LLM analysis on documents
  • Get LLM extraction results
  • Update and verify extraction results
  • Delete LLM results

Schemas API

Manage extraction schemas

  • Create new extraction schemas
  • List existing schemas
  • Get schema details
  • Update schemas
  • Delete schemas
  • Validate data against schemas

Prompts API

Manage extraction prompts

  • Create new prompts
  • List existing prompts
  • Get prompt details
  • Update prompts
  • Delete prompts

Tags API

Manage document tags

  • Create new tags
  • List existing tags
  • Update tags
  • Delete tags

Code Examples

Documents API

# List documents
documents = client.documents.list(organization_id, skip=0, limit=10, tag_ids=["tag1", "tag2"])

# Get a document
document = client.documents.get(organization_id, document_id)

# Update a document
client.documents.update(organization_id, document_id, document_name="New Name", tag_ids=["tag1"])

# Delete a document
client.documents.delete(organization_id, document_id)

# Upload a document
import base64
with open("sample.pdf", "rb") as f:
    content = base64.b64encode(f.read()).decode("utf-8")

result = client.documents.upload(organization_id, [{
    "name": "sample.pdf",
    "content": content,
    "tag_ids": []
}])

OCR API

# Get OCR blocks
blocks = client.ocr.get_blocks(organization_id, document_id)

# Get OCR text
text = client.ocr.get_text(organization_id, document_id, page_num=1)

# Get OCR metadata
metadata = client.ocr.get_metadata(organization_id, document_id)
print(f"Number of pages: {metadata.n_pages}")

LLM API

# List LLM models
models = client.llm.list_models()

# Run LLM analysis
result = client.llm.run(organization_id, document_id, prompt_id="default", force=False)

# Get LLM result
llm_result = client.llm.get_result(organization_id, document_id, prompt_id="default")

# Update LLM result
updated_result = client.llm.update_result(
    organization_id, 
    document_id,
    updated_llm_result={"key": "value"},
    prompt_id="default",
    is_verified=True
)

# Delete LLM result
client.llm.delete_result(organization_id, document_id, prompt_id="default")

Schemas API

# Create a schema
schema_config = {
    "name": "Invoice Schema",
    "response_format": {
        "type": "json_schema",
        "json_schema": {
            "name": "invoice_extraction",
            "schema": {
                "type": "object",
                "properties": {
                    "invoice_date": {
                        "type": "string",
                        "description": "invoice date"
                    }
                },
                "required": ["invoice_date"],
                "additionalProperties": False
            },
            "strict": True
        }
    }
}
new_schema = client.schemas.create(organization_id, schema_config)

# List schemas
schemas = client.schemas.list(organization_id)

# Get a schema
schema = client.schemas.get(organization_id, schema_id)

# Validate data against a schema
validation_result = client.schemas.validate(organization_id, schema_id, {"invoice_date": "2023-01-01"})

Prompts API

# Create a prompt
prompt_config = {
    "name": "Invoice Extractor",
    "content": "Extract the following fields from the invoice...",
    "schema_id": "schema_id_here",
    "schema_version": 1,
    "tag_ids": ["tag1", "tag2"],
    "model": "gpt-4o-mini"
}
new_prompt = client.prompts.create(organization_id, prompt_config)

# List prompts
prompts = client.prompts.list(organization_id, document_id="doc_id", tag_ids=["tag1"])

# Get a prompt
prompt = client.prompts.get(organization_id, prompt_id)

Tags API

# Create a tag
tag_config = {
    "name": "Invoices",
    "color": "#FF5733",
    "description": "All invoice documents"
}
new_tag = client.tags.create(organization_id, tag_config)

# List tags
tags = client.tags.list(organization_id)

# Update a tag
updated_tag = client.tags.update(organization_id, tag_id, tag_config)

Error Handling

The SDK provides detailed error messages when API calls fail:

try:
    result = client.documents.get(organization_id, "invalid_id")
except Exception as e:
    print(f"API Error: {str(e)}")

SDK Structure

The DocRouter SDK is organized into several modules:

docrouter_sdk/
├── __init__.py                 # Package initialization
├── api/                        # API client modules
│   ├── __init__.py
│   ├── client.py               # Main client class
│   ├── documents.py            # Documents API
│   ├── llm.py                  # LLM API
│   ├── ocr.py                  # OCR API
│   ├── prompts.py              # Prompts API
│   ├── schemas.py              # Schemas API
│   └── tags.py                 # Tags API
├── models/                     # Data models
│   ├── __init__.py
│   ├── document.py             # Document models
│   ├── llm.py                  # LLM models
│   ├── ocr.py                  # OCR models
│   ├── prompt.py               # Prompt models
│   ├── schema.py               # Schema models
│   └── tag.py                  # Tag models
└── examples/                   # Usage examples
    ├── README.md
    └── basic_docrouter_client.py

GitHub Repository

The DocRouter Python SDK is part of the docrouter.ai open source project. You can find the source code on GitHub.

View on GitHub