Documentation Overview

Welcome to the OpenWebUI documentation landing page. This parent README provides links and summaries of key guides for:

Document Creation & Batch Optimization: Best practices for authoring and preparing high-quality documents and uploading them in batches.
Content Retrieval Management: Guidelines for configuring and tuning ingestion, chunking, embedding, and extraction pipelines.

Use this page as your starting point to navigate to detailed instructions tailored to your workflow.

1. Document Creation & Batch Optimization

A comprehensive guide to ensure your documents are crafted and prepared for efficient ingestion:

Focus on Essential Content: Techniques to keep documents concise, structured, and retrieval-friendly.
Minimize Historical Noise: Strategies to prune outdated background and avoid retrieval inaccuracies.
Pre-Upload & Batch Preparation: Supported formats, naming conventions, manifests, file sizing, and splitting for batch uploads.
Quality Checks: Pre-chunk simulations, search testing, and peer reviews to validate content accuracy.
Maintenance & Cleanup: Archival, temp file cleanup, and version retention policies.

See full guide: Document Creation & Batch Optimization Best Practices

Detailed recommendations for configuring OpenWebUI’s extraction, chunking, and embedding subsystems:

Chunk Size & Overlap: Character- and token-based guidelines and tuning tips for different document types.
Model Limits & Context Windows: Typical OpenAI and other model context limits to inform chunk strategy.
Content Extraction Engines: Overview of Default, Tika, Mistral OCR, Document Intelligence, Docling, and External backends.
Embedding Engines & Models: Supported engines (SentenceTransformers, OpenAI, Ollama), model dimensions, and batch size best practices.

See full guide: Content Retrieval Configuration Best Practices

Determine Your Workflow: Identify whether you’re authoring new documents or tuning ingestion pipelines.
Follow the Relevant Guide: Click the link above to navigate to detailed steps.
Implement & Validate: Apply best practices, then test with sample documents and queries.
Iterate & Monitor: Use monitoring tools and logs to refine configurations over time.