Your browser is a neural network runtime now

Remember when running AI meant spinning up a Python server, renting a GPU, and crying over CUDA drivers? Yeah, that's optional now.
Transformers.js by Hugging Face loads ML models straight in the browser. No backend. No Python. No "please install PyTorch 2.1.3 with CUDA 11.8 on a full moon."
You point it at a model on Hugging Face Hub → it downloads ONNX weights → caches them in the browser → runs inference on WebAssembly/WebGPU. First load takes a few seconds, every next run works offline.
What kind of models?
Small, task-specific ones that do one thing really well. Background removal (RMBG-1.4, ~43MB — handles hair and fur better than some paid APIs). Portrait segmentation (MediaPipe, ~5MB — near instant). Plus classification, object detection, sentiment analysis, summarization — dozens of ready-to-go models on the Hub.
Not GPT in your browser. More like a Swiss army knife of tiny specialized models.
Why this is perfect for side projects
No server = no hosting bill. No API = no rate limits, no leaked keys. No uploads = real privacy, not "we totally won't look at your data" privacy. Wrap it as a PWA and congratulations — you just shipped an offline AI desktop app via a URL.
I tested this approach by building a background remover — RMBG-1.4 + MediaPipe client-side, with an editor for masks, backgrounds, shadows, text. Zero accounts, zero watermarks, works on a plane.
→ remify.xyz
if you want to poke at it.

Processing takes a couple seconds because your CPU is doing what a server GPU used to. Honestly? Worth it for an app that has zero infrastructure to babysit.

Now go pick a model from the Hub and break something.