Documentation Overview

Welcome to the OpenWebUI documentation landing page. This parent README provides links and summaries of key guides for:

  • Document Creation & Batch Optimization: Best practices for authoring and preparing high-quality documents and uploading them in batches.
  • Content Retrieval Management: Guidelines for configuring and tuning ingestion, chunking, embedding, and extraction pipelines.

Use this page as your starting point to navigate to detailed instructions tailored to your workflow.


1. Document Creation & Batch Optimization

A comprehensive guide to ensure your documents are crafted and prepared for efficient ingestion:

  • Focus on Essential Content: Techniques to keep documents concise, structured, and retrieval-friendly.
  • Minimize Historical Noise: Strategies to prune outdated background and avoid retrieval inaccuracies.
  • Pre-Upload & Batch Preparation: Supported formats, naming conventions, manifests, file sizing, and splitting for batch uploads.
  • Quality Checks: Pre-chunk simulations, search testing, and peer reviews to validate content accuracy.
  • Maintenance & Cleanup: Archival, temp file cleanup, and version retention policies.

See full guide: Document Creation & Batch Optimization Best Practices


2. Content Retrieval Management

Detailed recommendations for configuring OpenWebUI’s extraction, chunking, and embedding subsystems:

  • Chunk Size & Overlap: Character- and token-based guidelines and tuning tips for different document types.
  • Model Limits & Context Windows: Typical OpenAI and other model context limits to inform chunk strategy.
  • Content Extraction Engines: Overview of Default, Tika, Mistral OCR, Document Intelligence, Docling, and External backends.
  • Embedding Engines & Models: Supported engines (SentenceTransformers, OpenAI, Ollama), model dimensions, and batch size best practices.

See full guide: Content Retrieval Configuration Best Practices


3. How to Use This Documentation

  1. Determine Your Workflow: Identify whether you’re authoring new documents or tuning ingestion pipelines.
  2. Follow the Relevant Guide: Click the link above to navigate to detailed steps.
  3. Implement & Validate: Apply best practices, then test with sample documents and queries.
  4. Iterate & Monitor: Use monitoring tools and logs to refine configurations over time.