Skip to Content
    QAFKA
    CTRL K
    CTRL K
    • Introduction
      • Quick Start
      • Configuration
      • React Native Widget
      • Theming
      • Context
      • Navigation
      • External Navigation
      • Handling Tools
      • Voice Chat
      • Sub-Projects
      • Error Handling
      • CLI
      • Dashboard
      • Invitations
      • Settings
        • Project
        • Overview
        • Conversations
        • Chat Test
        • Sub-Projects
        • Analysis
        • Configuration
        • Members
        • Documents
        • Tools
        • Action Logs
        • Navigation Rules
        • External Destinations
        • Chat Theme
        • API Keys
      • API Key Security
    • Introduction
      • Quick Start
      • Configuration
      • React Native Widget
      • Theming
      • Context
      • Navigation
      • External Navigation
      • Handling Tools
      • Voice Chat
      • Sub-Projects
      • Error Handling
      • CLI
      • Dashboard
      • Invitations
      • Settings
        • Project
        • Overview
        • Conversations
        • Chat Test
        • Sub-Projects
        • Analysis
        • Configuration
        • Members
        • Documents
        • Tools
        • Action Logs
        • Navigation Rules
        • External Destinations
        • Chat Theme
        • API Keys
      • API Key Security

    On This Page

    • Document Types
    • Sources
    • FAQ Items
    • AI Document Generation
    • Generation Limits
    • Hard Source Limits
    • Draft → Publish Lifecycle
    • Chunking
    • How Documents Differ from Critical Instructions
    • Tuning Retrieval Quality
    Question? Give us feedback Edit this page 
    DashboardInside a ProjectDocuments

    Documents

    Documents are the long-form knowledge base your AI uses to answer user questions. Anything you upload — terms of service, FAQ, product catalog descriptions, internal SOPs — becomes searchable context the AI can ground its answers in.

    Document Types

    Every document carries a type that tells the AI how to use the content. Picking the right type is more important than it looks: the AI follows different rules per type, so a contract loaded as INFORMATION is paraphrased while the same contract loaded as RULE_POLICY is delivered verbatim.

    TypeWhat it tells the AI to doUse it for
    INFORMATION (default)Use the content naturally to form an answer; rephrase and summarize while preserving accuracyReference material, product descriptions, narrative explanations
    FAQMatch the user’s question to the closest entry and deliver that answer directlyQuestion-answer content with discrete entries — see FAQ Items below for the special structure
    GUIDEPreserve step order; never skip steps; if asked about a specific step, continue forward from thereHow-to procedures, onboarding flows, troubleshooting trees
    RULE_POLICYDeliver exactly as written — don’t soften, shorten, add commentary, or rephraseLegal terms, official policies, regulatory text where the wording matters

    The type is a strong signal — wrong type leads to wrong tone (a TOS document loaded as INFORMATION gets paraphrased into something the legal team didn’t approve; a chatty FAQ loaded as RULE_POLICY reads like a manifesto). If a single document mixes content kinds, split it into multiple typed documents.

    Sources

    A document’s content can come from four places:

    • Manual — paste or write the content directly. Best for short, hand-curated text.
    • PDF upload — extracted via pdf-parse; page separators are stripped.
    • Word (.docx) upload — extracted via mammoth → HTML → markdown for clean structure preservation.
    • Markdown / HTML upload — passed through turndown for normalization.

    For uploads, the file becomes a single document; if the source is large, it’s auto-chunked at indexing time (see Chunking).

    FAQ Items

    FAQ documents have a special structure: instead of free text, you add FAQ items — explicit Q&A pairs. Each item also accepts question variants, alternate phrasings of the same question (e.g. “how do I cancel?”, “I want to cancel my order”, “iptal nasıl yapılır”). Variants are embedded individually so semantic search finds them regardless of how the user asks.

    This typically gives sharper retrieval than dumping Q&A pairs into a long INFORMATION document, because each entry is its own embedding rather than a sliver of a chunked text.

    AI Document Generation

    For sources you don’t want to type by hand — long PDFs, contracts, scraped web pages — Qafka can generate the document for you in two output modes:

    • FAQ mode — extract the source into structured Q&A pairs (with variants), saved as a FAQ document.
    • INFORMATION (Summary) mode — produce a clean narrative summary, saved as an INFORMATION document.

    You provide the source (uploaded file or pasted text), pick the mode, optionally add a custom prompt and a target output language. The dashboard runs an estimate first (tokens + cost preview), then on confirm runs the generation — output lands as a DRAFT for review.

    Generation Limits

    Limits come from the active subscription plan, not per-document settings:

    • aiDocGenerationMaxTokens — cap on the output of one generation (default 16384). Long sources can still be processed; the output just truncates.
    • aiDocGenerationMonthlyLimit — number of generations allowed per billing cycle. null means unlimited; 0 disables AI generation entirely on that plan.

    Source token estimate is shown in the preview before you commit, so you don’t burn a generation slot on something that’s obviously oversized.

    Hard Source Limits

    Sources over ~100k tokens are rejected outright (no chunking on the input side yet). Pre-trim very long sources, split into multiple generations, or upload the file directly as a regular document if you don’t need AI restructuring.

    Draft → Publish Lifecycle

    AI-generated documents land as DRAFTS. Drafts are visible in the dashboard but not indexed for retrieval — the AI can’t see them yet. This is deliberate: AI output needs review before it shapes user-facing answers.

    Publishing a draft activates it and triggers embedding generation. Once embedded, it joins the retrieval pool for the next user question.

    You can edit a draft before publishing freely. Editing a published document re-embeds the changed content automatically.

    Manually-created or directly-uploaded documents skip the draft phase — they’re published on creation by default.

    Chunking

    Long documents are auto-chunked at upload so each piece fits in the embedding model’s context window. Chunks share the same parent document but each carries its own embedding, and retrieval surfaces the relevant chunk rather than the whole parent.

    You don’t configure chunking — it happens automatically. The implication is that very long documents can have multiple chunks ranking against each other in retrieval; if you find one chunk consistently misleading the AI, splitting the source into smaller, focused documents gives you finer control.

    How Documents Differ from Critical Instructions

    Both shape what the AI says, but they work differently:

    • Critical Instructions apply to every message. They cost tokens on every turn and are best for short, always-on rules (“respond in user’s language”, “keep replies under 3 sentences”).
    • Documents are retrieved on-demand based on the user’s question. The AI only “sees” the documents relevant to the current query, so size doesn’t directly cost tokens on unrelated turns. Best for content the AI doesn’t need on every message.

    If you find yourself repeating policy text in Critical Instructions, it usually belongs in a RULE_POLICY document instead.

    Tuning Retrieval Quality

    The AI’s answer quality is roughly bounded by the quality of the documents it retrieves. Three levers, in order of impact:

    • Pick the right type. A RULE_POLICY document forces verbatim delivery; an INFORMATION one allows paraphrasing. Picking wrong is a bigger quality hit than any other lever.
    • Fewer, sharper documents. Many overlapping documents create retrieval ambiguity and dilute the relevant signal. Consolidating duplicate content into one canonical document usually beats adding more.
    • Use FAQ items, not long FAQ narratives. Discrete Q&A entries with variants embed and retrieve better than a wall of “Q: … A: …” text in an INFORMATION document.
    Last updated on June 3, 2026
    MembersTools

    MIT 2026 QAFKA