Why local AI
The motivations behind running retrieval and inference on your own machine.
Hosted chat services are great, until the context is private. Personal notes, a private repo, an unreleased spec — these are the things you wouldn't paste into a public chatbot, and they're exactly what LocalLens is built for.
It optimises for four things.
1. Files never leave the machine
QVAC loads the chat and embedding models locally and runs them in your process. Discovery, chunking, embedding, retrieval, prompting, completion — every step happens on your hardware. Nothing is uploaded.
2. Answers stay grounded
The prompt builder in src/rag.ts
tells the model to answer only from the retrieved chunks. If the
answer isn't in them, the model says so. Bracketed citations like
[1] and [2] point back to the exact chunk a claim came from.
This isn't a style choice. Grounded prompts cut the "confidently wrong" failure mode that hits general assistants whenever you point them at specialised content.
3. Cloud-optional
Pull the network plug and LocalLens still works. CLI and browser UI both run offline. The only network traffic is the first-time model download, and QVAC caches that to disk.
If you later want to share a brain with a teammate, you can — but that's a choice you make in code, not a default that ships data out behind your back.
4. Small enough to read
Eight TypeScript files in src/. You can read the whole codebase in
an afternoon and extend it without learning a new framework. No plugin
system, no abstract repository interface, no DI container — none of
those are real bottlenecks at this size.
Where this stops being enough
LocalLens is not a production document-management system. It's a deliberately small reference. If you need permissions, multi-user access, or proper backup-and-restore, you'll outgrow it, and that's fine. The point is to give you a clear starting line.
Further reading
- QVAC documentation — the SDK doing the heavy lifting.
- Bun documentation — the runtime LocalLens sits on.