Why QVAC
The five SDK calls that do most of the work, and why they ship together.
QVAC is the SDK running LocalLens's local AI loop. The interesting thing about it for a project this size: one package handles five jobs that would otherwise pull in five different libraries.
The five SDK calls we use
import {
loadModel,
ragChunk,
ragIngest,
ragSearch,
completion,
// and the lifecycle helpers:
ragCloseWorkspace,
ragDeleteWorkspace,
close,
} from "@qvac/sdk";| Call | Where it's used | What it does |
|---|---|---|
loadModel | src/qvac.ts → ensureReady | Loads the embedding model (GTE_LARGE_FP16) and the chat model (QWEN3_1_7B_INST_Q4 with a 600M fallback). |
ragChunk | src/rag.ts → chunkDocument | Splits document text into ~220-token windows with 40-token overlap. |
ragIngest | src/qvac.ts → ingestChunks | Embeds chunks and stores them in a named workspace. |
ragSearch | src/qvac.ts → search | Embeds a query and returns top-K matching chunks. |
completion | src/qvac.ts → answer | Streams a chat completion from the loaded model. |
ragCloseWorkspace | src/qvac.ts → closeWorkspace | Closes (and optionally deletes) the workspace on disk. |
close | src/qvac.ts → close | Tears down the QVAC runtime when the app exits. |
That's the whole API surface LocalLens uses. No manual embedding loop, no separate vector database, no custom token splitter.
Model lifecycle
QVAC models load lazily on first use:
private async ensureReady(): Promise<void> {
if (this.chatModelId && this.embeddingModelId) return;
this.readyPromise ??= this.loadModels().finally(() => {
this.readyPromise = undefined;
});
await this.readyPromise;
}Two consequences worth knowing about:
- The cold-start cost lands on the first question, not on boot.
That keeps
bun run devsnappy and gives you a clear "loading model…" moment to hang a UI hint on. - Concurrent calls share one in-flight load. Every caller
awaits the same
readyPromiseuntil it resolves. Two requests arriving in the same tick won't both fireloadModelfor the same source.
Why a fallback model
The default chat model is QWEN3_1_7B_INST_Q4. On older or smaller
machines it can fail to load. The gateway catches that and falls
back to QWEN3_600M_INST_Q4:
try {
this.chatModelId = await loadModel({ modelSrc: QWEN3_1_7B_INST_Q4, modelConfig });
} catch {
this.chatModelId = await loadModel({ modelSrc: QWEN3_600M_INST_Q4, modelConfig });
}The fallback is invisible to callers. QvacGateway.answer keeps the
same streaming signature either way.
Why one SDK for everything?
LocalLens could have used @xenova/transformers for embeddings,
chromadb or qdrant for vectors, and llama.cpp for completion.
Each is a fine pick on its own. The cost of using all three is the
integration glue you'd have to write: model lifecycle, workspace
lifecycle, error mapping, async iteration. QVAC ships those for
you. That's most of why this app fits in eight files.
Useful upstream pages
- QVAC RAG reference —
ragChunk,ragIngest,ragSearch. - QVAC completion reference — streaming, KV cache, thinking capture.
- QVAC model catalog — supported model IDs.