LocalLens

Add PDF and image parsing with QVAC OCR

Reuse the QVAC OCR task to turn scanned pages and images into LocalDocuments.

Where it belongs: src/qvac.ts for the OCR call, src/files.ts for the adapter glue. The rest of the pipeline doesn't need to know.

LocalLens currently indexes plain-text formats: Markdown, source code, JSON, YAML. To bring images and scanned PDFs into the same brain, you don't need a third-party parser. The QVAC SDK ships an OCR task you can wire up alongside the chat and embedding models.

The official QVAC OCR reference lives at docs.qvac.tether.io/sdk/examples/ai-tasks/ocr. The recipe below assumes you have it open.

The QVAC OCR surface

import {
  loadModel,
  ocr,
  unloadModel,
  OCR_LATIN_RECOGNIZER_1,
} from "@qvac/sdk";

A single model (OCR_LATIN_RECOGNIZER_1) drives the recognizer. ocr({ modelId, image, options }) returns a blocks promise that resolves to an array of { text, bbox?, confidence? } objects.

The four-step recipe

1. Add an OCR method to the gateway

Extend QvacGateway so it loads the OCR model lazily, alongside the chat and embedding models, and exposes a single extractText helper:

src/qvac.ts
import {
  loadModel,
  ocr,
  unloadModel,
  OCR_LATIN_RECOGNIZER_1,
} from "@qvac/sdk";

export class QvacGateway {
  // existing fields…
  private ocrModelId: string | undefined;

  private async ensureOcrReady(): Promise<void> {
    if (this.ocrModelId) return;
    this.ocrModelId = await loadModel({
      modelSrc: OCR_LATIN_RECOGNIZER_1,
      modelType: "ocr",
      modelConfig: {
        langList: ["en"],
        useGPU: true,
        timeout: 30000,
      },
    });
  }

  async extractText(imagePath: string): Promise<string> {
    await this.ensureOcrReady();
    const { blocks } = ocr({
      modelId: required(this.ocrModelId, "QVAC OCR model is not loaded."),
      image: imagePath,
      options: { paragraph: false },
    });
    const result = await blocks;
    return result
      .map((block) => block.text)
      .filter((line) => line.trim().length > 0)
      .join("\n");
  }
}

Same pattern as the rest of the gateway: lazy load, share an in-flight promise via the required helper, expose one method per task. The OCR model is independent of chat and embedding, so its load doesn't block the rest of the pipeline.

2. Branch on extension when reading

Update discoverTextDocuments (and browserDocumentsFromInput) so image-shaped files run through OCR before becoming a LocalDocument. The LocalDocument shape stays the same. Only the source of the content changes.

src/files.ts
import { QvacGateway } from "./qvac.ts";

const ocrExtensions = new Set([".bmp", ".jpg", ".jpeg", ".png", ".tiff"]);

export async function discoverTextDocuments(
  rootPath: string,
  gateway: QvacGateway,
): Promise<LocalDocument[]> {
  // … existing folder walk …
  const ext = path.posix.extname(absolutePath).toLowerCase();

  let content: string;
  if (ocrExtensions.has(ext)) {
    content = await gateway.extractText(absolutePath);
  } else {
    content = await readFile(absolutePath, "utf8").catch(() => "");
  }
  // … rest unchanged …
}

supportedExtensions grows by the OCR set. The rules below it (no null bytes, ≤2 MB, non-empty content) keep filtering exactly the same way. An OCR'd image with no recognized text is skipped, just like an empty text file.

3. Handle PDFs page by page

QVAC OCR takes an image, not a PDF. Convert each PDF page to an image first — any rasteriser works (pdftoppm, pdf-poppler, or a Node binding) — then call gateway.extractText per page and join the results:

async function extractPdfText(filePath: string, gateway: QvacGateway): Promise<string> {
  const pageImages = await rasterisePdfToTempImages(filePath); // [path1.png, path2.png, …]
  try {
    const pages = await Promise.all(
      pageImages.map((img) => gateway.extractText(img)),
    );
    return pages.map((text, i) => `--- page ${i + 1} ---\n${text}`).join("\n\n");
  } finally {
    await Promise.all(pageImages.map((img) => unlink(img).catch(() => undefined)));
  }
}

Page markers in the joined text help the chunker keep page boundaries visible in citations later.

4. Plumb the gateway through the workflow

LocalLensApp already owns a QvacGateway. Pass it down to discoverTextDocuments so the file adapter can call extractText without needing to know about model lifecycle:

src/locallens.ts
async createBrainFromFolder(input: CreateBrainFromFolderInput): Promise<Brain> {
  const folderPath = path.resolve(input.folderPath);
  const documents = await discoverTextDocuments(folderPath, this.qvac);
  return this.createBrainFromLocalDocuments(input.name.trim(), folderPath, documents);
}

That's the only signature that changes. The rest of the workflow — chunking, ingestion, the JSON store — operates on LocalDocument[] exactly as before.

What you don't need to change

  • rag.ts — chunking is unchanged. It takes a LocalDocument and doesn't care whether the content came from disk text or OCR.
  • store.ts — the JSON shape is the same.
  • domain.tsLocalDocument already has relativePath, content, checksum, and bytes. That's everything the rest of the pipeline needs.

That's the payoff of the file-adapter seam: a new format is one gateway method plus one branch in the file walker.

Tear down OCR when you're done indexing

The OCR model can stay loaded for the lifetime of the gateway. If you want to free its memory after a one-shot CLI run, call await unloadModel({ modelId: this.ocrModelId, clearStorage: false }) inside QvacGateway.close() before close().

External references

On this page