2. Chunking and the grounded prompt

src/rag.ts does two jobs: split documents into chunks, and build a prompt that grounds the model in retrieved excerpts. Both are mostly pure. They call QVAC's ragChunk for tokenization; otherwise they only know about domain.ts types.

Chunking

src/rag.ts

import { ragChunk } from "@qvac/sdk";
import type { ChatMessage, LocalDocument, SearchHit, TextChunk } from "./domain.ts";

export type ChunkOptions = {
  brainId: string;
  chunkSize?: number;
  chunkOverlap?: number;
};

export async function chunkDocuments(
  documents: LocalDocument[],
  options: ChunkOptions,
): Promise<TextChunk[]> {
  const chunks: TextChunk[] = [];
  for (const document of documents) chunks.push(...(await chunkDocument(document, options)));
  return chunks;
}

export async function chunkDocument(
  document: LocalDocument,
  { brainId, chunkSize = 220, chunkOverlap = 40 }: ChunkOptions,
): Promise<TextChunk[]> {
  if (chunkSize <= chunkOverlap) throw new Error("chunkSize must be greater than chunkOverlap");
  const text = document.content.replace(/\r\n/g, "\n").trim();
  if (!text) return [];

  const qvacChunks = await ragChunk({
    documents: text,
    chunkOpts: {
      chunkSize,
      chunkOverlap,
      chunkStrategy: "paragraph",
      splitStrategy: "token",
    },
  });

  return qvacChunks
    .map((chunk) => chunk.content.trim())
    .filter(Boolean)
    .map((content, chunkIndex) => ({
      id: `${document.checksum.slice(0, 12)}-${chunkIndex}`,
      brainId,
      relativePath: document.relativePath,
      chunkIndex,
      content,
      checksum: document.checksum,
    }));
}

Chunk size and overlap

Defaults: 220 tokens with 40 tokens of overlap. Two reasons those numbers work for documentation-shaped content:

Most paragraphs in a README or note fit comfortably in 220 tokens, so the chunker rarely splits in the middle of a thought.
40 tokens of overlap keeps a sentence that straddles two chunks searchable from either side without bloating the index.

The runtime check chunkSize > chunkOverlap is a guard against silent infinite loops — the QVAC SDK would otherwise produce overlapping chunks that never advance.

Stable IDs

The chunk ID combines the document checksum and the chunk index:

id: `${document.checksum.slice(0, 12)}-${chunkIndex}`

Two consequences:

Reindexing the same file produces the same chunk IDs, which makes incremental reindexing tractable later.
Editing a single byte changes the checksum and therefore every chunk ID for that file. That's correct — the embeddings are no longer valid.

The grounded prompt

export function buildGroundedHistory(question: string, hits: SearchHit[]): ChatMessage[] {
  const context = hits
    .map(
      (hit, index) => `[${index + 1}] ${hit.relativePath}#chunk-${hit.chunkIndex}\n${hit.content}`,
    )
    .join("\n\n---\n\n");

  return [
    {
      role: "system",
      content: [
        "You are LocalLens, a local-first file chat assistant that answers questions strictly from the provided source excerpts.",
        "Rules:",
        "1. Only use facts that appear in the excerpts. If the answer is not in them, say so plainly.",
        "2. Refer to source excerpts inline using bracketed numbers, for example [1] or [2], when you use them.",
        "3. Answer in the same language as the user's question.",
        "4. Keep answers focused and concrete. No filler.",
        "5. Do not include hidden reasoning, chain-of-thought, or thinking tags.",
      ].join(" "),
    },
    {
      role: "user",
      content: `Source excerpts:\n\n${context || "No matching chunks were found."}\n\nQuestion:\n${question}`,
    },
  ];
}

That's the whole anti-hallucination strategy. Five system rules, plus a user turn that lays the excerpts out as numbered evidence.

The "no matching chunks" fallback

If hits is empty, the user message contains the literal string "No matching chunks were found." instead of an excerpt block. The system prompt's first rule then takes over and the model says it doesn't know.

Numbered excerpts and citations

Each hit becomes:

[N] relative/path#chunk-<index>
<chunk content>

The [N] is what the model echoes back as [1], [2] in its answer. The relative path lets the caller render a sources list that links back to the file.

Why no chat history?

Each call to buildGroundedHistory produces a fresh two-message history based on the current question and search hits. Earlier turns don't carry forward. That's how LocalLens stays grounded across a session — every question is its own evidence retrieval round.

What you can run after this step

Tests live in tests/prompt.test.ts and tests/chunker.test.ts. Both pass with just domain.ts and rag.ts in place.

Next: the QVAC gateway, where chunking and prompting meet model loading and inference.