Devlog #2. Last time I covered the hybrid PDF engine setup (Rust + PDFKit + Swift). This time: getting Apple Vision API working from Tauri for fully offline OCR.
The goal
Extract text from scanned/image-based PDFs — completely offline, no cloud involved.
Apple's Vision Framework ships with every Mac, handles Japanese and English well, and costs nothing. The challenge: Tauri is Rust-based, and you can't call Swift APIs directly.
The bridge architecture
React (UI)
↓ invoke()
Tauri Command (Rust)
↓ std::process::Command
Swift CLI binary
↓
Apple Vision Framework (OCR)
↓ stdout (JSON)
Rust → back to React
Compile Swift as a standalone CLI binary, call it from Rust as a child process. Simple, effective.
Swift CLI (excerpt)
import Vision
import Foundation
func recognizeText(imagePath: String) -> String {
guard let image = NSImage(contentsOfFile: imagePath),
let cgImage = image.cgImage(forProposedRect: nil, context: nil, hints: nil)
else { return "{\"error\": \"image load failed\"}" }
var results: [String] = []
let request = VNRecognizeTextRequest { req, _ in
guard let observations = req.results as? [VNRecognizedTextObservation] else { return }
results = observations.compactMap { $0.topCandidates(1).first?.string }
}
request.recognitionLevel = .accurate
request.usesLanguageCorrection = true
request.recognitionLanguages = ["ja-JP", "en-US"]
let handler = VNImageRequestHandler(cgImage: cgImage)
try? handler.perform([request])
let json = try? JSONSerialization.data(withJSONObject: ["text": results.joined(separator: "\n")])
return String(data: json ?? Data(), encoding: .utf8) ?? "{}"
}
let args = CommandLine.arguments
if args.count > 1 { print(recognizeText(imagePath: args[1])) }
Rust caller (excerpt)
use std::process::Command;
pub fn ocr_pdf_page(image_path: &str) -> Result {
let output = Command::new("/path/to/vision-ocr-cli")
.arg(image_path)
.output()
.map_err(|e| e.to_string())?;
let stdout = String::from_utf8_lossy(&output.stdout);
Ok(stdout.to_string())
}
Where I got stuck
1. Sandbox permissions
Tauri apps run sandboxed by default. External process execution needs an explicit allowlist in tauri.conf.json:
"allowlist":{"shell":{"execute":true}}
2. Japanese OCR accuracy
Without setting recognitionLanguages, Vision defaults to English-first. Adding ["ja-JP", "en-US"] made a significant difference for Japanese documents.
Current state (dev build)
Load a scanned PDF, get selectable/copyable text out. Everything runs locally — no data leaves the machine.
Next devlog
AES-256 encryption and Zero Leak Architecture — how I designed a PDF tool that never touches the internet.
Hiyoko PDF Vault → https://hiyokoko.gumroad.com/l/HiyokoPDFVault
X → @hiyoyok