Calling Apple Vision API from Tauri for Offline OCR [PDF Devlog #2]

rust dev.to

Devlog #2. Last time I covered the hybrid PDF engine setup (Rust + PDFKit + Swift). This time: getting Apple Vision API working from Tauri for fully offline OCR.


The goal

Extract text from scanned/image-based PDFs — completely offline, no cloud involved.

Apple's Vision Framework ships with every Mac, handles Japanese and English well, and costs nothing. The challenge: Tauri is Rust-based, and you can't call Swift APIs directly.


The bridge architecture

React (UI)
  ↓ invoke()
Tauri Command (Rust)
  ↓ std::process::Command
Swift CLI binary
  ↓
Apple Vision Framework (OCR)
  ↓ stdout (JSON)
Rust → back to React
Enter fullscreen mode Exit fullscreen mode

Compile Swift as a standalone CLI binary, call it from Rust as a child process. Simple, effective.


Swift CLI (excerpt)

import Vision
import Foundation

func recognizeText(imagePath: String) -> String {
    guard let image = NSImage(contentsOfFile: imagePath),
          let cgImage = image.cgImage(forProposedRect: nil, context: nil, hints: nil)
    else { return "{\"error\": \"image load failed\"}" }

    var results: [String] = []
    let request = VNRecognizeTextRequest { req, _ in
        guard let observations = req.results as? [VNRecognizedTextObservation] else { return }
        results = observations.compactMap { $0.topCandidates(1).first?.string }
    }
    request.recognitionLevel = .accurate
    request.usesLanguageCorrection = true
    request.recognitionLanguages = ["ja-JP", "en-US"]

    let handler = VNImageRequestHandler(cgImage: cgImage)
    try? handler.perform([request])

    let json = try? JSONSerialization.data(withJSONObject: ["text": results.joined(separator: "\n")])
    return String(data: json ?? Data(), encoding: .utf8) ?? "{}"
}

let args = CommandLine.arguments
if args.count > 1 { print(recognizeText(imagePath: args[1])) }
Enter fullscreen mode Exit fullscreen mode

Rust caller (excerpt)

use std::process::Command;

pub fn ocr_pdf_page(image_path: &str) -> Result {
    let output = Command::new("/path/to/vision-ocr-cli")
        .arg(image_path)
        .output()
        .map_err(|e| e.to_string())?;

    let stdout = String::from_utf8_lossy(&output.stdout);
    Ok(stdout.to_string())
}
Enter fullscreen mode Exit fullscreen mode

Where I got stuck

1. Sandbox permissions

Tauri apps run sandboxed by default. External process execution needs an explicit allowlist in tauri.conf.json:

"allowlist":{"shell":{"execute":true}}
Enter fullscreen mode Exit fullscreen mode

2. Japanese OCR accuracy

Without setting recognitionLanguages, Vision defaults to English-first. Adding ["ja-JP", "en-US"] made a significant difference for Japanese documents.


Current state (dev build)

Load a scanned PDF, get selectable/copyable text out. Everything runs locally — no data leaves the machine.


Next devlog

AES-256 encryption and Zero Leak Architecture — how I designed a PDF tool that never touches the internet.


Hiyoko PDF Vault → https://hiyokoko.gumroad.com/l/HiyokoPDFVault
X → @hiyoyok

Source: dev.to

arrow_back Back to Tutorials