Add better diagnostics
Surface model state, indexing progress, file and chunk counts, and real latency numbers from the QVAC profiler.
Where it belongs: src/qvac.ts for model state and the
profiler bridge, src/server.ts for exposure, src/ui/app.js
for display.
LocalLens hides most of its lifecycle from the user. That's fine when everything is fast. It's annoying when a model takes a minute to load and the UI shows nothing. A small diagnostics surface fixes this without overcomplicating the core.
Two kinds of signal are worth showing:
- Operational state — what is: loading vs loaded, file and chunk counts, the active chat model. The gateway already knows these; you just need to expose them.
- Performance metrics — how fast: model load times,
embedding latency, RAG search and ingest timings, completion
throughput. The QVAC SDK ships a
profilerutility that captures all of these without any timing code on your end.
The two are complementary. Operational state tells you what's happening now. The profiler tells you how long things took.
What to expose
- Model loading state. Loading / loaded / failed, plus which models.
- Indexing progress. Bytes read, chunks created, current file.
- Brain stats. File count, chunk count, embedding model name.
- Active model. Which chat model is in use right now (the 1.7B or the 600M fallback).
- Latency aggregates. Last and average timing for
loadModel,embed,completion,ragIngest,ragSearch.
The first four come from the gateway. The fifth comes from the profiler.
The recipe
1. Add a status getter on the gateway
type ModelState = "unloaded" | "loading" | "loaded" | "error";
export class QvacGateway {
// existing fields
private state: { chat: ModelState; embedding: ModelState } = {
chat: "unloaded",
embedding: "unloaded",
};
status() {
return {
chat: { state: this.state.chat, modelId: this.chatModelId },
embedding: { state: this.state.embedding, modelId: this.embeddingModelId },
};
}
// wrap loadModel calls to update state
private async loadModels(): Promise<void> {
this.state.embedding = "loading";
try {
this.embeddingModelId ??= await loadModel({ modelSrc: GTE_LARGE_FP16 });
this.state.embedding = "loaded";
} catch (e) {
this.state.embedding = "error";
throw e;
}
// … same pattern for chat …
}
}The status getter is read-only and side-effect-free, so it's safe to call from the diagnostics endpoint without touching the model lifecycle.
2. Turn on the QVAC profiler
@qvac/sdk ships a profiler utility that records timing across
model loads, completions, embeddings, and RAG operations. Enable
it once at boot and let it capture every call the gateway makes:
import { profiler } from "@qvac/sdk";
// near the top of the file, alongside chatModelConfig
profiler.enable();profiler.enable() is a global switch. Every subsequent SDK call
records events. If you'd rather opt in per call, every QVAC
function also accepts { profiling: { enabled: true } } in its
options, and you can opt out selectively with
{ profiling: { enabled: false } }.
Three export shapes are available:
| Method | Returns | Best for |
|---|---|---|
profiler.exportSummary() | High-level summary string | Logging on shutdown. |
profiler.exportTable() | Detailed aggregated table | Reading in a terminal. |
profiler.exportJSON() | { aggregates, recentEvents, config } | Serving over HTTP for the UI. |
JSON is what the diagnostics endpoint should return. The UI can format it however it likes.
Add a tiny helper on the gateway so callers don't have to import
profiler directly:
metrics() {
return profiler.exportJSON();
}Profiler doesn't measure everything
The profiler captures latency for SDK calls. It doesn't directly measure indexing progress, file or chunk counts, or which chat model is currently active. Those come from the gateway's own state, which is why this page keeps both surfaces. The profiler complements the status getter — it doesn't replace it.
3. Add a /api/diagnostics route
if (url.pathname === "/api/diagnostics" && request.method === "GET") {
return json({
qvac: app.diagnostics(),
metrics: app.metrics(),
brains: (await app.listBrains()).map((b) => ({
id: b.id,
name: b.name,
status: b.status,
fileCount: b.fileCount,
chunkCount: b.chunkCount,
lastIndexedAt: b.lastIndexedAt,
lastError: b.lastError,
})),
});
}app.qvac is private. Either expose diagnostics() and
metrics() on LocalLensApp that proxy through, or make the
gateway protected. Proxying is cleaner — keep LocalLensApp as
the only thing the server talks to:
diagnostics() {
return this.qvac.status();
}
metrics() {
return this.qvac.metrics();
}4. Indexing progress
Indexing currently runs as a single async call. To stream progress, switch the workflow to a generator:
async *createBrainFromFolderProgress(input: CreateBrainFromFolderInput): AsyncGenerator<ProgressEvent, Brain> {
yield { type: "discovery", message: "Walking folder…" };
const documents = await discoverTextDocuments(folderPath);
yield { type: "discovered", fileCount: documents.length };
yield { type: "chunking" };
const chunks = await chunkDocuments(documents, { brainId: brain.id });
yield { type: "chunked", chunkCount: chunks.length };
yield { type: "ingesting" };
await this.qvac.ingestChunks(brain.workspace, chunks);
yield { type: "ingested" };
return indexed;
}The server then exposes it as a Server-Sent Events stream:
if (url.pathname === "/api/brains/from-files/stream") {
const stream = new ReadableStream({ /* yield events */ });
return new Response(stream, { headers: { "content-type": "text/event-stream" } });
}The UI consumes the SSE stream and updates a progress bar.
5. Render it
Three new render functions in app.js:
function renderDiagnostics(diagnostics) {
// model state badge in the topbar
// brain table in a side panel
}
function renderMetrics(metrics) {
// "Last search: 230 ms · avg 280 ms"
// "Embedding model loaded in 4.2s"
// — pulled from metrics.aggregates and metrics.recentEvents
}
function renderIndexingProgress(event) {
// progress bar updates from SSE events
}Keep the panel collapsible. Most users don't want it open by default.
Diagnostics are read-only
None of these endpoints should change state. If a future feature needs to restart a model or evict a brain, those are separate routes. Diagnostics shows what's happening. It doesn't intervene.
What you don't need to change
domain.ts— types stay the same.rag.ts— chunking and prompts unaffected.store.ts— the JSON shape is unchanged. The diagnostics endpoint reads what's already there.
External references
- QVAC profiler reference —
full API including
enable,disable,exportSummary,exportTable,exportJSON, and the per-callprofilingoption.