Schemas

Schemas define structured data extraction

Use JSON schemas to ensure consistent, validated output from your prompts.

Get started in 3 steps

1

Design your schema

Identify the fields you need to extract and choose appropriate types (string, number, array, object).

2

Create the schema

Use the schema editor or API to define your fields with clear descriptions. All fields are required in strict mode.

3

Link to prompts

Connect your schema to prompts so extracted data matches your defined structure exactly.


What are Schemas?

Schemas define the structure and format of data extracted from documents. They use JSON Schema format with strict mode enabled, ensuring 100% adherence to your defined structure.

  • Structured output: Data is returned in consistent JSON format
  • Type validation: Fields are validated against defined types
  • Strict mode: All fields are required, ensuring complete output
  • No post-processing: Output is immediately usable by your application

Schema Format

Schemas follow OpenAI’s Structured Outputs format:

{
  "type": "json_schema",
  "json_schema": {
    "name": "document_extraction",
    "schema": {
      "type": "object",
      "properties": {
        "field_name": {
          "type": "string",
          "description": "Clear description of what to extract"
        }
      },
      "required": ["field_name"],
      "additionalProperties": false
    },
    "strict": true
  }
}

Key requirements:

  • All properties must be in the required array
  • additionalProperties: false must be set at every level
  • strict: true ensures 100% schema adherence

Field Types

Basic Types

string — Text, names, addresses, formatted numbers

number — Numeric values for calculations

integer — Whole numbers

boolean — True/false values

Complex Types

array — Lists of items (e.g., line items, skills)

object — Nested structures (e.g., address, contact info)


Best Practices

Clear Descriptions — Write detailed field descriptions that guide the AI on what to extract and expected formats.

Choose Right Types — Use string for formatted values (currency), number for calculations.

Keep It Simple — Use basic types only for maximum portability across LLM providers.

All Fields Required — In strict mode, all fields must be in the required array. Missing data returns empty strings or defaults.

Use AI to Design Schemas — It is much simpler to use AI to design effective schemas and prompts than to design them manually. Use the DocRouter.AI MCP server in Claude Code or Cursor to generate and update schemas.


AI-Powered Schema Design

Designing complex JSON schemas manually can be error-prone. We recommend using the DocRouter.AI MCP Server to automate this process.

Using the MCP Server

If you use Claude Code or Cursor, you can connect to our Model Context Protocol (MCP) server to manage schemas using natural language.

  1. Generate Schemas: Ask the AI: “Create a DocRouter schema for medical invoices with patient name, date, and a list of procedures.”
  2. Update Schemas: Ask the AI: “Add a ‘total_tax’ field to my existing Invoice schema.”
  3. Validate: The AI ensures that strict: true, additionalProperties: false, and required arrays are correctly configured.

This approach allows you to focus on what data you need, while the AI handles the how of JSON Schema compliance.


Learn More


Ready to create your first schema?

Open Dashboard