Grammarly costs $12/mo — a local LLM does it for free (Chrome + Ollama)

dev.to

I write a lot in the browser — email, GitHub comments, contact forms — and I wanted proofreading without uploading every keystroke to a company's cloud. My workplace bans Grammarly for exactly that reason.

So I built inline-scribe: a Chrome extension that proofreads your text with an AI that runs on your own machine (Ollama). Nothing leaves your computer. And the fixes show up like Word's Track Changes — accept or reject each one individually with ✓ / ✕.

This post is about the two design decisions I found most interesting while building it. Both generalize to anyone wiring a local LLM into a product.

  1. The LLM never produces the diff.
  2. Silencing Ollama's 403 with declarativeNetRequest — zero-config, no OLLAMA_ORIGINS.

The missing ingredient was never the AI

If you write in a browser today, you pick one of three bad options:

  1. Grammarly — great UX, but every keystroke goes to their cloud, the good features are behind a subscription, and many workplaces ban it (legal docs, unreleased code, patient data).
  2. Paste into ChatGPT — you get one big rewritten blob back. Which words changed? Did it alter your meaning? You re-read everything, every time, and your text still went to someone else's server.
  3. Nothing — and ship the typos.

The thing is, the AI isn't the hard part anymore. Anyone can run a capable model locally with Ollama in two commands, for free. What's missing is the interface. The reason Grammarly was worth paying for was never the grammar engine — it was the friendly diff that lets you see and control each change.

That interface, on top of a model you own, is the whole product.

corrections your text goes to inline diff, per-fix accept/reject price
Grammarly cloud AI their servers ✅ (the reason people pay) $12+/mo
Harper (10k★) local, rule-based nowhere ✅ ❌ underlines typos only — can't rewrite a clumsy sentence free
scramble / Typollama local LLM ✅ nowhere ✅ ❌ whole-text replacement or popup free
inline-scribe local LLM ✅ nowhere ✅ free

Design decision #1: the LLM never produces the diff

This is the one I most want to share.

The intuitive move is to ask the model for structured output: "return the changes as JSON," something like [{ "delete": "...", "insert": "..." }, ...], and pipe it straight into the UI.

But small local models break when you do this. A model like llama3.2 (3B) is surprisingly good at fixing prose and terrible at structured output: it breaks the JSON, adds explanations, wraps everything in a code fence, renames your keys. A chatty 3B model means a broken UI.

So I split the responsibilities:

  • The model's job: return corrected prose — just text.
  • The extension's job: compute the changes (hunks) from (original, corrected) with a deterministic algorithm.
you press Alt+G in a text field
   │
   ▼
the extension sends your text to YOUR endpoint     ← default: Ollama on 127.0.0.1
(an OpenAI-compatible /chat/completions API)          model: llama3.2 (~2GB, free)
   │
   ▼
the model returns corrected prose — just text
   │
   ▼
inline-scribe computes a word-level diff           ← deterministic algorithm,
between your text and the correction                  NOT the LLM's opinion
   │
   ▼
review panel: accept ✓ / reject ✕ each change → Apply writes back only what you approved
Enter fullscreen mode Exit fullscreen mode

The diff tokenizes into words + whitespace + punctuation runs, then does an LCS (longest common subsequence) walk:

// Tokenize into words/whitespace/punctuation, preserving everything
export function tokenize(text: string): string[] {
  return text.match(/\s+|[^\s\w]+|\w+/gu) ?? [];
}

export function diffText(original: string, corrected: string): Hunk[] {
  const a = tokenize(original);
  const b = tokenize(corrected);
  // DP table of LCS lengths over a × b (Uint32Array rows)
  // Walk the table emitting equal / delete / insert, merging adjacent ops.
  // Collapse delete+insert neighbours into one `replace` so a phrase rewrite
  // reads as a single reviewable hunk instead of three.
  ...
}
Enter fullscreen mode Exit fullscreen mode

This split has a lot of happy side effects:

  • Model-agnostic. Any OpenAI-compatible endpoint works (llama.cpp, LM Studio, vLLM, or your own key). Since nothing depends on structured-output quality, the UI behaves the same whether you run 3B or 70B.
  • Deterministic, so reproducible. Same input/output → same hunks. Easy to unit-test.
  • Accept/reject is trivial. A hunk is { kind, original, corrected }. Accepted hunks take corrected, rejected take original, concatenate — done.
export function applyDecisions(hunks: Hunk[], accepted: boolean[]): string {
  let result = '';
  hunks.forEach((h, idx) => {
    if (h.kind === 'equal') result += h.original;
    else result += accepted[idx] ? h.corrected : h.original;
  });
  return result;
}
Enter fullscreen mode Exit fullscreen mode

Even with "return only text," small models still wrap output in fences or quotes. I gave up on prompting that away and instead strip the obvious wrappers in post — without touching real content:

export function stripWrapping(reply: string, original: string): string {
  let out = reply.replace(/\r\n/g, '\n');
  const fence = out.match(/^\s*```[a-z]*\n([\s\S]*?)\n```\s*$/);
  if (fence) out = fence[1];                       // strip ```...```
  out = out.replace(/^\s+|\s+$/g, '');
  if (/^".*"$/s.test(out) && !/^"/.test(original.trim())) out = out.slice(1, -1); // strip whole-reply quotes
  if (original.endsWith('\n') && !out.endsWith('\n')) out += '\n'; // preserve trailing-newline convention
  return out;
}
Enter fullscreen mode Exit fullscreen mode

Takeaway: let small local models do what they're good at (return natural language) and keep the structure — diffs, JSON, state — in deterministic code. This isn't specific to proofreading; it's a general principle for putting a local LLM into a product.

Design decision #2: silencing Ollama's 403 with declarativeNetRequest

This is the pothole every Chrome-extension × Ollama project hits.

Stock Ollama rejects requests carrying a chrome-extension://... Origin with a 403 — a guard against cross-origin access from extensions. The official workaround is to have the user set the OLLAMA_ORIGINS env var. But asking for that means:

  • exporting OLLAMA_ORIGINS=chrome-extension://<id-that-changes>
  • different steps depending on shell config and how Ollama is launched
  • the single biggest "I installed it and it doesn't work" drop-off point

In other words, the moment your README documents an env var, you've lost. It should just work with a stock ollama serve.

The fix: use MV3's declarativeNetRequest (DNR) to strip the Origin header from requests to the configured endpoint with a dynamic rule. No Origin, no 403.

async function syncOriginRule(): Promise<void> {
  const stored = await chrome.storage.sync.get('config');
  const endpoint = stored.config?.endpoint ?? DEFAULT_CONFIG.endpoint;
  const host = new URL(endpoint).host; // e.g. 127.0.0.1:11434

  await chrome.declarativeNetRequest.updateDynamicRules({
    removeRuleIds: [1],
    addRules: [{
      id: 1,
      priority: 1,
      condition: { urlFilter: `||${host}/`, resourceTypes: ['xmlhttprequest'] },
      action: {
        type: 'modifyHeaders',
        requestHeaders: [{ header: 'origin', operation: 'remove' }],
      },
    }],
  });
}
Enter fullscreen mode Exit fullscreen mode

Key points:

  • Scope the rule to the user's configured host only (urlFilter). It's not a dangerous "strip Origin everywhere" rule.
  • The endpoint is configurable, so watch chrome.storage.onChanged and re-apply the rule whenever config changes.
  • The only permissions needed are declarativeNetRequest plus localhost host_permissions.
// manifest.json (excerpt)"permissions":["storage","activeTab","declarativeNetRequest","contextMenus"],"host_permissions":["http://127.0.0.1/*","http://localhost/*"],
Enter fullscreen mode Exit fullscreen mode

The result: the user's steps are "install Ollama, ollama serve, install the extension." Zero env vars.

MV3 architecture: do the fetch in the service worker

One more thing. The actual fetch (the request to 127.0.0.1) happens in the service worker, not the content script:

// content → background message; background runs the check and replies
chrome.runtime.onMessage.addListener((msg, _sender, sendResponse) => {
  if (msg?.type !== 'inline-scribe:check') return undefined;
  (async () => {
    const config = { ...DEFAULT_CONFIG, ...(await chrome.storage.sync.get('config')).config };
    try {
      const corrected = await new OllamaChecker(config).check(msg.text);
      sendResponse({ ok: true, corrected, model: config.model });
    } catch (err) {
      sendResponse({ ok: false, error: /* CheckerError message */ });
    }
  })();
  return true; // keep the channel open for the async response
});
Enter fullscreen mode Exit fullscreen mode

Two reasons:

  • It isn't bound by the page's CSP. A fetch from a content script gets blocked when the page's Content-Security-Policy restricts connect-src. The service worker runs in the extension's context and is unaffected.
  • It plays well with the DNR Origin strip. The request now originates from the extension's service worker as an xmlhttprequest, so the rule above applies cleanly.

The UI (review panel, the ✎ selection icon, in-place replacement) is rendered in a shadow DOM from the content script so it doesn't collide with the page's CSS.

Wrapping up

inline-scribe is, at its core, "Grammarly's diff UX on top of a local LLM." The design decisions that made it work:

  • Don't let the LLM build the diff — use a deterministic algorithm. The UI never breaks on small models, it's model-agnostic, and it's easy to test.
  • Strip the Origin with DNR. No OLLAMA_ORIGINS, zero config.
  • Fetch in the service worker. Not bound by page CSP.

Swap the system prompt and the same diff UI becomes a translator or a tone-shifter. Source is MIT.

If you're putting a local LLM into a product, the leverage is in deciding what the model does — and what it doesn't.

Source: dev.to

arrow_back Back to News