Knowledge Bases

Knowledge bases provide context for better extraction

Upload reference documents so the AI can use them to improve accuracy and understand your specific terminology.

Get started in 3 steps

Create a knowledge base

Go to Knowledge Bases in the sidebar, click New Knowledge Base, and give it a name.

Upload reference documents

Add PDFs, TXT, or Markdown files with your reference materials (price lists, catalogs, terminology guides).

Test the knowledge base by chatting with it

Open the knowledge base and use the chat interface to ask questions and verify it's working correctly.

To use your knowledge base with extraction prompts, select it in the prompt editor. The AI will automatically use it when processing documents.

What are Knowledge Bases?

Knowledge bases are collections of reference documents that help the AI understand your specific terminology, rules, and context. They use RAG (Retrieval-Augmented Generation) to find relevant information and include it in the AI’s context.

Better accuracy: AI uses your reference materials to improve extraction
Industry terminology: Teach the AI specific terms used in your business
Reference data: Provide price lists, SKU catalogs, vendor IDs, or guidelines
Chat agents: Use knowledge bases as the backend for chat agents so users can ask questions over your documents

Configuration Settings

When creating a knowledge base, you can configure how documents are indexed and searched. Some settings are immutable after creation (you’ll need to create a new KB to change them), while others can be updated anytime.

Basic Settings

Name (required) — A descriptive name for your knowledge base
Description (optional) — Additional context about the knowledge base’s purpose
Tags (optional) — Document tags that automatically index documents into this knowledge base

Indexing Configuration (Immutable)

These settings determine how documents are split into chunks and embedded. They cannot be changed after creation — create a new knowledge base if you need different settings.

Chunker Type

How documents are split into chunks:

recursive (default) — Smart splitting that respects document structure
token — Split by token count
word — Split by word boundaries
sentence — Split by sentence boundaries

Chunk Size

Default: 512 tokens

Range: 50-2000 tokens. Larger chunks preserve more context but may be less precise for retrieval.

Chunk Overlap

Default: 128 tokens

Range: 0 to (chunk_size - 1). Overlap helps preserve context across chunk boundaries.

Embedding Model

Default: text-embedding-3-small

The model used to create vector embeddings. Available models depend on your configured LLM providers.

Search Configuration (Mutable)

These settings control how search results are returned and can be updated anytime.

Coalesce Neighbors (default: 0, range: 0-5) — Number of neighboring chunks to include with each search result for additional context. Set to 0 to return only matched chunks.

Reconciliation Settings (Mutable)

Reconciliation automatically fixes drift between document tags and knowledge base indexes, detecting:

Missing documents — Documents with matching tags that aren’t indexed
Stale documents — Indexed documents that no longer have matching tags
Orphaned vectors — Vector embeddings without corresponding documents
Reconcile Enabled (default: false) — Enable automatic periodic reconciliation
Reconcile Interval (minimum: 60 seconds) — How often reconciliation runs when enabled

You can also manually trigger reconciliation at any time from the knowledge base details page.

Managing Knowledge Bases

Creating Knowledge Bases

Go to Knowledge Bases in the sidebar, click New Knowledge Base, configure indexing settings, and assign tags.

Note: Indexing settings (chunker type, chunk size, embedding model) cannot be changed after creation.

Automatic Indexing

Documents with matching tags are automatically indexed into the knowledge base. No manual upload needed — just tag your documents.

Using with Prompts

In the prompt editor, select a knowledge base from the dropdown. The AI will automatically use it when processing documents.

Reconciliation

Enable periodic reconciliation to automatically fix drift between document tags and indexes, or trigger it manually.

Best Practices

✓ Keep It Focused — Create separate knowledge bases for different topics (e.g., "Legal Terms" vs "Product Specs").

✓ Clean Data — Ensure reference documents are clear, well-formatted, and up-to-date for best results.

✓ Update Regularly — Keep price lists, catalogs, and reference materials current to maintain accuracy.

✓ Use Markdown for Chat Agents — When building knowledge bases as a chat agent backend, use documents in Markdown format for better structure and readability.

✓ Use for Chat — Knowledge bases can power chat agents, allowing users to ask natural-language questions over your documents.

Learn More

Prompts — Link knowledge bases to your extraction prompts
Chat Agents — Build chat interfaces on top of your knowledge bases
REST API — Create and manage knowledge bases programmatically

Ready to create your first knowledge base?

Open Dashboard