LocalLens

Why local AI

The motivations behind running retrieval and inference on your own machine.

Hosted chat services are great, until the context is private. Personal notes, a private repo, an unreleased spec — these are the things you wouldn't paste into a public chatbot, and they're exactly what LocalLens is built for.

It optimises for four things.

1. Files never leave the machine

QVAC loads the chat and embedding models locally and runs them in your process. Discovery, chunking, embedding, retrieval, prompting, completion — every step happens on your hardware. Nothing is uploaded.

2. Answers stay grounded

The prompt builder in src/rag.ts tells the model to answer only from the retrieved chunks. If the answer isn't in them, the model says so. Bracketed citations like [1] and [2] point back to the exact chunk a claim came from.

This isn't a style choice. Grounded prompts cut the "confidently wrong" failure mode that hits general assistants whenever you point them at specialised content.

3. Cloud-optional

Pull the network plug and LocalLens still works. CLI and browser UI both run offline. The only network traffic is the first-time model download, and QVAC caches that to disk.

If you later want to share a brain with a teammate, you can — but that's a choice you make in code, not a default that ships data out behind your back.

4. Small enough to read

Eight TypeScript files in src/. You can read the whole codebase in an afternoon and extend it without learning a new framework. No plugin system, no abstract repository interface, no DI container — none of those are real bottlenecks at this size.

Where this stops being enough

LocalLens is not a production document-management system. It's a deliberately small reference. If you need permissions, multi-user access, or proper backup-and-restore, you'll outgrow it, and that's fine. The point is to give you a clear starting line.

Further reading

On this page