Darkbloom – Private inference on idle Macs
Private inference on idle Macs
We present Darkbloom, a decentralized inference network. AI compute today flows through three layers of markup — GPU manufacturers to hyperscalers to API providers to end users. Meanwhile, over 100 million Apple Silicon machines sit idle for most of each day. We built a network that connects them directly to demand. Operators cannot observe inference data. The API is OpenAI-compatible. Our measurements show up to 70% lower costs compared to centralized alternatives. Operators retain 95% of revenue.
Inference at half the cost
Idle hardware has near-zero marginal cost. That saving passes through to price. OpenAI-compatible API for chat, image generation, and speech-to-text. Every request is end-to-end encrypted.
Open Console ↗Earn USD from idle Apple Silicon
Your Mac already has the hardware. Operators keep 100% of inference revenue. Electricity cost on Apple Silicon runs $0.01–0.03 per hour depending on workload. The rest is profit.
Start Earning ↗NVIDIA sells GPUs to hyperscalers. AWS, Google, Azure, and CoreWeave mark them up and rent capacity to AI companies. AI companies mark them up again and charge end users per token. Each layer takes a cut. End users pay multiples of what the silicon actually costs to run.
Meanwhile, Apple has shipped over 100 million machines with serious ML hardware. Unified memory architectures. 273 to 819 GB/s memory bandwidth. Neural Engines. Machines capable of running 235-billion-parameter models. Most sit idle 18 or more hours a day. Their owners earn nothing from this compute.
That is not a technology problem. It is a marketplace problem.
The pattern is familiar. Airbnb connected idle rooms to travelers. Uber connected idle cars to riders. Rooftop solar turned idle rooftops into energy assets. In each case, distributed idle capacity undercut centralized incumbents on price because the marginal cost was near zero.
Darkbloom does this for AI compute. Idle Macs serve inference. Users pay less because there is no hyperscaler in the middle. Operators earn from hardware they already own. Unlike those other networks, the operator cannot see the user's data.
The hard part is trust. You are sending prompts to a machine you do not own, operated by someone you have never met. Your company's internal data. Your users' conversations. Your competitive advantage, running on hardware in someone else's house.
No enterprise will do this without guarantees stronger than a terms-of-service document.
Without verifiable privacy, decentralized inference does not work.
Access path elimination
We eliminate every software path through which an operator could observe inference data. Four independent layers, each independently verifiable.
Encrypted end-to-end
Requests are encrypted on the user's device before transmission. The coordinator routes ciphertext. Only the target node's hardware-bound key can decrypt.
Hardware-verified
Each node holds a key generated inside Apple's tamper-resistant secure hardware. The attestation chain traces back to Apple's root certificate authority.
Hardened runtime
The inference process is locked at the OS level. Debugger attachment is blocked. Memory inspection is blocked. The operator cannot extract data from a running process.
Traceable to hardware
Every response is signed by the specific machine that produced it. The full attestation chain is published. Anyone can verify it independently.
The operator runs your inference. They cannot see your data.
Prompts are encrypted before they leave your machine. The coordinator routes traffic it cannot read. The provider decrypts inside a hardened process it cannot inspect. The attestation chain is public.
Read the paper ↗OpenAI-compatible API
Change the base URL. Everything else works. Streaming, function calling, all existing SDKs.
from openai import OpenAI
client = OpenAI(
base_url="https://api.darkbloom.dev/v1",
api_key="your-api-key"
)
response = client.chat.completions.create(
model="mlx-community/gemma-4-26b-a4b-it-8bit",
messages=[{"role": "user", "content": "Hello!"}],
stream=True
)
for chunk in response:
print(chunk.choices[0].delta.content, end="")
Cost comparison
Idle hardware has near-zero marginal cost, so the savings pass through. No subscriptions or minimums. Per-token pricing compared against OpenRouter equivalents.
| Model | Input | Output | OpenRouter | Savings |
|---|---|---|---|---|
| Gemma 4 26B4B active, fast multimodal MoE | $0.03 | $0.20 | $0.40 | 50% |
| Qwen3.5 27BDense, frontier reasoning | $0.10 | $0.78 | $1.56 | 50% |
| Qwen3.5 122B MoE10B active, best quality | $0.13 | $1.04 | $2.08 | 50% |
| MiniMax M2.5 239B11B active, SOTA coding | $0.06 | $0.50 | $1.00 | 50% |
Prices per million tokens
Image Generation
Speech-to-Text
Platform Fee
Operator economics
Operators contribute idle Apple Silicon and earn USD. 100% of inference revenue goes to the operator. The only variable cost is electricity.
Install via Terminal
Downloads the provider binary and configures a launchd service.
$ curl -fsSL https://api.darkbloom.dev/install.sh | bashEarnings estimate
Select hardware to model projected operator earnings.
Estimates only. Actual results depend on network demand and model popularity. Assumes you own the Mac.
Read the research paper
Architecture specification, threat model, security analysis, and economic model for hardware-verified private inference on distributed Apple Silicon.
Download PDF ↗Available models
Curated for quality. Only models worth paying for.
Source: hackernews