Add PDF and image parsing with QVAC OCR
Reuse the QVAC OCR task to turn scanned pages and images into LocalDocuments.
Where it belongs: src/qvac.ts for the OCR call, src/files.ts
for the adapter glue. The rest of the pipeline doesn't need to
know.
LocalLens currently indexes plain-text formats: Markdown, source code, JSON, YAML. To bring images and scanned PDFs into the same brain, you don't need a third-party parser. The QVAC SDK ships an OCR task you can wire up alongside the chat and embedding models.
The official QVAC OCR reference lives at
docs.qvac.tether.io/sdk/examples/ai-tasks/ocr.
The recipe below assumes you have it open.
The QVAC OCR surface
import {
loadModel,
ocr,
unloadModel,
OCR_LATIN_RECOGNIZER_1,
} from "@qvac/sdk";A single model (OCR_LATIN_RECOGNIZER_1) drives the recognizer.
ocr({ modelId, image, options }) returns a blocks promise that
resolves to an array of { text, bbox?, confidence? } objects.
The four-step recipe
1. Add an OCR method to the gateway
Extend QvacGateway so it loads the OCR model lazily, alongside
the chat and embedding models, and exposes a single extractText
helper:
import {
loadModel,
ocr,
unloadModel,
OCR_LATIN_RECOGNIZER_1,
} from "@qvac/sdk";
export class QvacGateway {
// existing fields…
private ocrModelId: string | undefined;
private async ensureOcrReady(): Promise<void> {
if (this.ocrModelId) return;
this.ocrModelId = await loadModel({
modelSrc: OCR_LATIN_RECOGNIZER_1,
modelType: "ocr",
modelConfig: {
langList: ["en"],
useGPU: true,
timeout: 30000,
},
});
}
async extractText(imagePath: string): Promise<string> {
await this.ensureOcrReady();
const { blocks } = ocr({
modelId: required(this.ocrModelId, "QVAC OCR model is not loaded."),
image: imagePath,
options: { paragraph: false },
});
const result = await blocks;
return result
.map((block) => block.text)
.filter((line) => line.trim().length > 0)
.join("\n");
}
}Same pattern as the rest of the gateway: lazy load, share an
in-flight promise via the required helper, expose one method per
task. The OCR model is independent of chat and embedding, so its
load doesn't block the rest of the pipeline.
2. Branch on extension when reading
Update discoverTextDocuments (and browserDocumentsFromInput)
so image-shaped files run through OCR before becoming a
LocalDocument. The LocalDocument shape stays the same. Only
the source of the content changes.
import { QvacGateway } from "./qvac.ts";
const ocrExtensions = new Set([".bmp", ".jpg", ".jpeg", ".png", ".tiff"]);
export async function discoverTextDocuments(
rootPath: string,
gateway: QvacGateway,
): Promise<LocalDocument[]> {
// … existing folder walk …
const ext = path.posix.extname(absolutePath).toLowerCase();
let content: string;
if (ocrExtensions.has(ext)) {
content = await gateway.extractText(absolutePath);
} else {
content = await readFile(absolutePath, "utf8").catch(() => "");
}
// … rest unchanged …
}supportedExtensions grows by the OCR set. The rules below it
(no null bytes, ≤2 MB, non-empty content) keep filtering exactly
the same way. An OCR'd image with no recognized text is skipped,
just like an empty text file.
3. Handle PDFs page by page
QVAC OCR takes an image, not a PDF. Convert each PDF page to an
image first — any rasteriser works (pdftoppm, pdf-poppler, or
a Node binding) — then call gateway.extractText per page and
join the results:
async function extractPdfText(filePath: string, gateway: QvacGateway): Promise<string> {
const pageImages = await rasterisePdfToTempImages(filePath); // [path1.png, path2.png, …]
try {
const pages = await Promise.all(
pageImages.map((img) => gateway.extractText(img)),
);
return pages.map((text, i) => `--- page ${i + 1} ---\n${text}`).join("\n\n");
} finally {
await Promise.all(pageImages.map((img) => unlink(img).catch(() => undefined)));
}
}Page markers in the joined text help the chunker keep page boundaries visible in citations later.
4. Plumb the gateway through the workflow
LocalLensApp already owns a QvacGateway. Pass it down to
discoverTextDocuments so the file adapter can call extractText
without needing to know about model lifecycle:
async createBrainFromFolder(input: CreateBrainFromFolderInput): Promise<Brain> {
const folderPath = path.resolve(input.folderPath);
const documents = await discoverTextDocuments(folderPath, this.qvac);
return this.createBrainFromLocalDocuments(input.name.trim(), folderPath, documents);
}That's the only signature that changes. The rest of the workflow
— chunking, ingestion, the JSON store — operates on
LocalDocument[] exactly as before.
What you don't need to change
rag.ts— chunking is unchanged. It takes aLocalDocumentand doesn't care whether the content came from disk text or OCR.store.ts— the JSON shape is the same.domain.ts—LocalDocumentalready hasrelativePath,content,checksum, andbytes. That's everything the rest of the pipeline needs.
That's the payoff of the file-adapter seam: a new format is one gateway method plus one branch in the file walker.
Tear down OCR when you're done indexing
The OCR model can stay loaded for the lifetime of the gateway.
If you want to free its memory after a one-shot CLI run, call
await unloadModel({ modelId: this.ocrModelId, clearStorage: false })
inside QvacGateway.close() before close().
External references
- QVAC OCR reference
OCR_LATIN_RECOGNIZER_1model — language options and additional recognizers.