Add better diagnostics

Where it belongs: src/qvac.ts for model state and the profiler bridge, src/server.ts for exposure, src/ui/app.js for display.

LocalLens hides most of its lifecycle from the user. That's fine when everything is fast. It's annoying when a model takes a minute to load and the UI shows nothing. A small diagnostics surface fixes this without overcomplicating the core.

Two kinds of signal are worth showing:

Operational state — what is: loading vs loaded, file and chunk counts, the active chat model. The gateway already knows these; you just need to expose them.
Performance metrics — how fast: model load times, embedding latency, RAG search and ingest timings, completion throughput. The QVAC SDK ships a profiler utility that captures all of these without any timing code on your end.

The two are complementary. Operational state tells you what's happening now. The profiler tells you how long things took.

What to expose

Model loading state. Loading / loaded / failed, plus which models.
Indexing progress. Bytes read, chunks created, current file.
Brain stats. File count, chunk count, embedding model name.
Active model. Which chat model is in use right now (the 1.7B or the 600M fallback).
Latency aggregates. Last and average timing for loadModel, embed, completion, ragIngest, ragSearch.

The first four come from the gateway. The fifth comes from the profiler.

The recipe

1. Add a status getter on the gateway

src/qvac.ts

type ModelState = "unloaded" | "loading" | "loaded" | "error";

export class QvacGateway {
  // existing fields
  private state: { chat: ModelState; embedding: ModelState } = {
    chat: "unloaded",
    embedding: "unloaded",
  };

  status() {
    return {
      chat: { state: this.state.chat, modelId: this.chatModelId },
      embedding: { state: this.state.embedding, modelId: this.embeddingModelId },
    };
  }

  // wrap loadModel calls to update state
  private async loadModels(): Promise<void> {
    this.state.embedding = "loading";
    try {
      this.embeddingModelId ??= await loadModel({ modelSrc: GTE_LARGE_FP16 });
      this.state.embedding = "loaded";
    } catch (e) {
      this.state.embedding = "error";
      throw e;
    }
    // … same pattern for chat …
  }
}

The status getter is read-only and side-effect-free, so it's safe to call from the diagnostics endpoint without touching the model lifecycle.

2. Turn on the QVAC profiler

@qvac/sdk ships a profiler utility that records timing across model loads, completions, embeddings, and RAG operations. Enable it once at boot and let it capture every call the gateway makes:

src/qvac.ts

import { profiler } from "@qvac/sdk";

// near the top of the file, alongside chatModelConfig
profiler.enable();

profiler.enable() is a global switch. Every subsequent SDK call records events. If you'd rather opt in per call, every QVAC function also accepts { profiling: { enabled: true } } in its options, and you can opt out selectively with { profiling: { enabled: false } }.

Three export shapes are available:

Method	Returns	Best for
`profiler.exportSummary()`	High-level summary string	Logging on shutdown.
`profiler.exportTable()`	Detailed aggregated table	Reading in a terminal.
`profiler.exportJSON()`	`{ aggregates, recentEvents, config }`	Serving over HTTP for the UI.

JSON is what the diagnostics endpoint should return. The UI can format it however it likes.

Add a tiny helper on the gateway so callers don't have to import profiler directly:

src/qvac.ts

metrics() {
  return profiler.exportJSON();
}

Profiler doesn't measure everything

The profiler captures latency for SDK calls. It doesn't directly measure indexing progress, file or chunk counts, or which chat model is currently active. Those come from the gateway's own state, which is why this page keeps both surfaces. The profiler complements the status getter — it doesn't replace it.

3. Add a `/api/diagnostics` route

src/server.ts

if (url.pathname === "/api/diagnostics" && request.method === "GET") {
  return json({
    qvac: app.diagnostics(),
    metrics: app.metrics(),
    brains: (await app.listBrains()).map((b) => ({
      id: b.id,
      name: b.name,
      status: b.status,
      fileCount: b.fileCount,
      chunkCount: b.chunkCount,
      lastIndexedAt: b.lastIndexedAt,
      lastError: b.lastError,
    })),
  });
}

app.qvac is private. Either expose diagnostics() and metrics() on LocalLensApp that proxy through, or make the gateway protected. Proxying is cleaner — keep LocalLensApp as the only thing the server talks to:

src/locallens.ts

diagnostics() {
  return this.qvac.status();
}

metrics() {
  return this.qvac.metrics();
}

4. Indexing progress

Indexing currently runs as a single async call. To stream progress, switch the workflow to a generator:

src/locallens.ts

async *createBrainFromFolderProgress(input: CreateBrainFromFolderInput): AsyncGenerator<ProgressEvent, Brain> {
  yield { type: "discovery", message: "Walking folder…" };
  const documents = await discoverTextDocuments(folderPath);
  yield { type: "discovered", fileCount: documents.length };

  yield { type: "chunking" };
  const chunks = await chunkDocuments(documents, { brainId: brain.id });
  yield { type: "chunked", chunkCount: chunks.length };

  yield { type: "ingesting" };
  await this.qvac.ingestChunks(brain.workspace, chunks);
  yield { type: "ingested" };

  return indexed;
}

The server then exposes it as a Server-Sent Events stream:

if (url.pathname === "/api/brains/from-files/stream") {
  const stream = new ReadableStream({ /* yield events */ });
  return new Response(stream, { headers: { "content-type": "text/event-stream" } });
}

The UI consumes the SSE stream and updates a progress bar.

5. Render it

Three new render functions in app.js:

function renderDiagnostics(diagnostics) {
  // model state badge in the topbar
  // brain table in a side panel
}

function renderMetrics(metrics) {
  // "Last search: 230 ms · avg 280 ms"
  // "Embedding model loaded in 4.2s"
  // — pulled from metrics.aggregates and metrics.recentEvents
}

function renderIndexingProgress(event) {
  // progress bar updates from SSE events
}

Keep the panel collapsible. Most users don't want it open by default.

Diagnostics are read-only

None of these endpoints should change state. If a future feature needs to restart a model or evict a brain, those are separate routes. Diagnostics shows what's happening. It doesn't intervene.

What you don't need to change

domain.ts — types stay the same.
rag.ts — chunking and prompts unaffected.
store.ts — the JSON shape is unchanged. The diagnostics endpoint reads what's already there.

External references

QVAC profiler reference — full API including enable, disable, exportSummary, exportTable, exportJSON, and the per-call profiling option.

On this page