Knowledge Bases
Knowledge bases provide context for better extraction
Upload reference documents so the AI can use them to improve accuracy and understand your specific terminology.
Get started in 3 steps
Create a knowledge base
Go to Knowledge Bases in the sidebar, click New Knowledge Base, and give it a name.
Upload reference documents
Add PDFs, TXT, or Markdown files with your reference materials (price lists, catalogs, terminology guides).
Test the knowledge base by chatting with it
Open the knowledge base and use the chat interface to ask questions and verify it's working correctly.
To use your knowledge base with extraction prompts, select it in the prompt editor. The AI will automatically use it when processing documents.
What are Knowledge Bases?
Knowledge bases are collections of reference documents that help the AI understand your specific terminology, rules, and context. They use RAG (Retrieval-Augmented Generation) to find relevant information and include it in the AI’s context.
- Better accuracy: AI uses your reference materials to improve extraction
- Industry terminology: Teach the AI specific terms used in your business
- Reference data: Provide price lists, SKU catalogs, vendor IDs, or guidelines
- Chat agents: Use knowledge bases as the backend for chat agents so users can ask questions over your documents
Configuration Settings
When creating a knowledge base, you can configure how documents are indexed and searched. Some settings are immutable after creation (you’ll need to create a new KB to change them), while others can be updated anytime.
Basic Settings
- Name (required) — A descriptive name for your knowledge base
- Description (optional) — Additional context about the knowledge base’s purpose
- Tags (optional) — Document tags that automatically index documents into this knowledge base
Indexing Configuration (Immutable)
These settings determine how documents are split into chunks and embedded. They cannot be changed after creation — create a new knowledge base if you need different settings.
Chunker Type
How documents are split into chunks:
- recursive (default) — Smart splitting that respects document structure
- token — Split by token count
- word — Split by word boundaries
- sentence — Split by sentence boundaries
Chunk Size
Default: 512 tokens
Range: 50-2000 tokens. Larger chunks preserve more context but may be less precise for retrieval.
Chunk Overlap
Default: 128 tokens
Range: 0 to (chunk_size - 1). Overlap helps preserve context across chunk boundaries.
Embedding Model
Default: text-embedding-3-small
The model used to create vector embeddings. Available models depend on your configured LLM providers.
Search Configuration (Mutable)
These settings control how search results are returned and can be updated anytime.
- Coalesce Neighbors (default: 0, range: 0-5) — Number of neighboring chunks to include with each search result for additional context. Set to 0 to return only matched chunks.
Reconciliation Settings (Mutable)
Reconciliation automatically fixes drift between document tags and knowledge base indexes, detecting:
- Missing documents — Documents with matching tags that aren’t indexed
- Stale documents — Indexed documents that no longer have matching tags
-
Orphaned vectors — Vector embeddings without corresponding documents
- Reconcile Enabled (default: false) — Enable automatic periodic reconciliation
- Reconcile Interval (minimum: 60 seconds) — How often reconciliation runs when enabled
You can also manually trigger reconciliation at any time from the knowledge base details page.
Managing Knowledge Bases
Creating Knowledge Bases
Go to Knowledge Bases in the sidebar, click New Knowledge Base, configure indexing settings, and assign tags.
Note: Indexing settings (chunker type, chunk size, embedding model) cannot be changed after creation.
Automatic Indexing
Documents with matching tags are automatically indexed into the knowledge base. No manual upload needed — just tag your documents.
Using with Prompts
In the prompt editor, select a knowledge base from the dropdown. The AI will automatically use it when processing documents.
Reconciliation
Enable periodic reconciliation to automatically fix drift between document tags and indexes, or trigger it manually.
Best Practices
✓ Keep It Focused — Create separate knowledge bases for different topics (e.g., "Legal Terms" vs "Product Specs").
✓ Clean Data — Ensure reference documents are clear, well-formatted, and up-to-date for best results.
✓ Update Regularly — Keep price lists, catalogs, and reference materials current to maintain accuracy.
✓ Use Markdown for Chat Agents — When building knowledge bases as a chat agent backend, use documents in Markdown format for better structure and readability.
✓ Use for Chat — Knowledge bases can power chat agents, allowing users to ask natural-language questions over your documents.
Learn More
- Prompts — Link knowledge bases to your extraction prompts
- Chat Agents — Build chat interfaces on top of your knowledge bases
- REST API — Create and manage knowledge bases programmatically