Flowork: Self-Hosted AI Stack with Sovereign Agent OS and LLM Gateway

What Is Flowork?

Flowork is a self-hosted AI infrastructure framework built on two core components: Flowork Agent (a lightweight agent operating system) and Flow Router (an LLM gateway). Both ship as single Go binaries, run entirely offline, and keep your data within your own infrastructure—no external APIs required unless you explicitly route to them.

The appeal is straightforward: you get a sovereign AI stack where compute, models, and data remain under your control.

The Architecture: Agent OS + Gateway

Flowork Agent acts as the foundational orchestration layer. It handles task scheduling, context management, and agent lifecycle—the OS-level primitives you'd build manually in most self-hosted setups. Rather than writing orchestration glue yourself, you get a pre-built runtime.

Flow Router sits upstream, acting as your LLM gateway. It routes inference requests to local models, remote endpoints, or a mix of both. You define routing policies (latency, cost, model capability) without redeploying agents.

Both are single Go binaries. No Docker orchestration overhead, no JVM startup tax, no Python interpreter juggling. That matters for reproducibility and operational simplicity.

Data Privacy & Offline-First Design

This architecture assumes you want inference to happen inside your perimeter. You can run Flowork Agent + Flow Router on a single machine, a private Kubernetes cluster, or an air-gapped environment. No telemetry phoning home, no inference logs shipped to a vendor SaaS.

Real trade-off: You own the operational burden. No managed scaling, no vendor support line, no automatic model updates. You patch, monitor, and upgrade the stack yourself.

Practical Considerations

What works well:

Predictable latency and cost (no per-token billing to external APIs)
Full audit trail of inference requests and decisions
Ability to swap underlying models without changing application code
Minimal resource footprint (Go binaries are lean)

What requires planning:

Local LLM serving (Ollama, vLLM, or similar) must be provisioned separately
Scaling across machines requires network coordination—no magic load balancing
Model fine-tuning or custom training is your responsibility
Observability tooling (logging, metrics) you must integrate

When Flowork Makes Sense

Use this stack if:

You process sensitive data that cannot leave your infrastructure
You need deterministic, auditable AI inference pipelines
You're already running self-hosted infrastructure and want to avoid vendor lock-in
You can operate a small distributed system

Skip it if:

You want zero operational overhead (managed services are faster to deploy)
You need cutting-edge model access (managed platforms usually ship new models first)
Your team is small and has no ops capacity

Getting Started

The typical workflow:

Deploy Flow Router binary on your gateway host
Deploy Flowork Agent binaries on compute nodes
Point agents at local or self-hosted LLM inference servers
Define routing policies in configuration
Call agents via their API

No SDKs required for basic use; JSON over HTTP is the interface.

The Honest Assessment

Flowork is not a "one-click AI" solution. It's infrastructure for teams who understand the trade-offs between sovereignty and operational complexity. The Go binary approach is genuinely smart—you get portability without the baggage of heavier runtimes. And the agent + gateway separation is sound architecture.

But success depends entirely on your willingness to operate it. If you're evaluating self-hosted AI stacks, measure Flowork against your actual data residency requirements and operational capacity. It's a solid choice for the right problem, not a universal upgrade.

Flowork is open source — both products:

🤖 Flowork Agent (the self-hosted agent OS): https://github.com/flowork-os/Flowork_Agent
🛣️ Flow Router (the sovereign LLM gateway): https://github.com/flowork-os/flowork_Router