OpenDuck
An open-source implementation of the ideas pioneered by MotherDuck — differential storage, hybrid (dual) execution, and transparent remote databases for DuckDB — available for anyone to run, extend, and build on.
MotherDuck showed that DuckDB can work beautifully in the cloud: ATTACH 'md:mydb', and remote tables appear local. Queries split transparently across your laptop and the cloud. Storage is layered and snapshot-based. OpenDuck takes those architectural ideas — differential storage, dual execution, the attach-based UX — and makes them open. Open protocol, open backend, open extension.
import duckdb
con = duckdb.connect()
con.execute("LOAD 'openduck';")
con.execute("ATTACH 'openduck:mydb?endpoint=http://localhost:7878&token=xxx' AS cloud;")
con.sql("SELECT * FROM cloud.users").show() # remote, transparent
con.sql("SELECT * FROM local.t JOIN cloud.t2 ON ...").show() # hybrid, one query
# direct connect using openduck python library
con = openduck.connect("od:mydb")
con = openduck.connect("openduck:myd")
# direct connect using duckb (TODO: needs duckdb to autoload openduck the same way motherduck works today)
con = duckdb.connect("od:mydb")
con = duckdb.connect("openduck:myd")What OpenDuck does
Differential storage
Append-only layers with PostgreSQL metadata. DuckDB sees a normal file; OpenDuck persists data as immutable sealed layers addressable from object storage. Snapshots give you consistent reads. One serialized write path, many concurrent readers.
Hybrid (dual) execution
A single query can run partly on your machine and partly on a remote worker. The gateway splits the plan, labels each operator LOCAL or REMOTE, and inserts bridge operators at the boundaries. Only intermediate results cross the wire.
[LOCAL] HashJoin(l.id = r.id)
[LOCAL] Scan(products) ← your laptop
[LOCAL] Bridge(R→L)
[REMOTE] Scan(sales) ← remote worker
DuckDB-native catalog
The extension implements DuckDB's StorageExtension and Catalog interfaces. Remote tables are first-class catalog entries, they participate in JOINs, CTEs, and the optimizer like local tables.
Open protocol
OpenDuck's protocol is intentionally minimal: two RPCs defined in execution.proto. The first one to execute a query, and the other to stream results back as Arrow IPC batches.
Because the protocol is open and simple, you're not locked into a single backend. Any service that speaks gRPC and returns Arrow can serve as an OpenDuck-compatible backend. Run the included Rust gateway, replace it with your own implementation, or plug in an entirely different execution engine — the client and extension don't care what's on the other side.
Architecture
┌─────────────────────────────────────────────┐
│ DuckDB process (client) │
│ │
│ LOAD openduck │
│ ATTACH 'openduck:mydb' AS cloud │
│ │
│ ┌─────────────────────────────────────┐ │
│ │ OpenDuckCatalog │ │
│ │ └─ OpenDuckSchemaEntry │ │
│ │ └─ OpenDuckTableEntry (users) │ │
│ │ └─ OpenDuckTableEntry (events) │ │
│ └──────────────┬──────────────────────┘ │
│ │ gRPC + Arrow IPC │
└─────────────────┼───────────────────────────┘
│
┌───────────▼───────────┐
│ Gateway (Rust) │
│ - auth, routing │
│ - plan splitting │ ┌──────────────┐
│ - backpressure │────▶│ Worker 1 │
│ │ │ (DuckDB) │
│ │ └──────────────┘
│ │ ┌──────────────┐
│ │────▶│ Worker N │
│ │ │ (DuckDB) │
└───────────────────────┘ └──────────────┘
│
┌─────────┴─────────┐
▼ ▼
┌──────────┐ ┌──────────────┐
│ Postgres │ │ Object store │
│ metadata │ │ sealed layers│
└──────────┘ └──────────────┘
Quick start
# Backend
cargo build --workspace
cargo run -p openduck -- serve -d mydb -t your-token
# Extension (requires vcpkg — see extensions/openduck/README.md)
cd extensions/openduck && make
# Python client
pip install -e clients/python
export OPENDUCK_TOKEN=your-token
python -c "
import openduck
con = openduck.connect('mydb')
con.sql('SELECT 1 AS x').show()
"Layout
crates/
exec-gateway/ Gateway — auth, routing, hybrid plan splitting
exec-worker/ Worker — embedded DuckDB, Arrow IPC streaming
exec-proto/ Protobuf/tonic codegen
openduck-cli/ Unified CLI (openduck serve|gateway|worker)
diff-*/ Differential storage pipeline (layers, metadata, FUSE)
extensions/
openduck/ DuckDB C++ extension (StorageExtension + Catalog)
clients/
python/ openduck Python package (pip install -e clients/python)
proto/
openduck/v1/ Protocol definition (execution.proto)
OpenDuck vs MotherDuck
MotherDuck is a commercial cloud service. OpenDuck is an open-source project inspired by its architecture.
| MotherDuck | OpenDuck | |
|---|---|---|
| What | Managed cloud service | Self-hosted open-source |
| Attach scheme | md: |
openduck: / od: |
| Auth | motherduck_token |
OPENDUCK_TOKEN |
| Differential storage | Proprietary | Open (Postgres metadata + object store) |
| Hybrid execution | Proprietary planner | Open (gateway + plan splitting) |
| Protocol | Private wire format | Open gRPC + Arrow IPC |
| Backend | MotherDuck's cloud | Anything implementing ExecutionService |
| Extension | Bundled in DuckDB | Separate loadable extension |
OpenDuck is not wire-compatible with MotherDuck. It reimplements the same architectural ideas as an open protocol.
OpenDuck vs Arrow Flight SQL
Arrow Flight SQL is a generic database protocol — "JDBC/ODBC over Arrow." OpenDuck is a DuckDB-specific system with a narrower scope but deeper integration.
| Arrow Flight SQL | OpenDuck | |
|---|---|---|
| Scope | Any SQL database | DuckDB-specific |
| Integration | Separate client driver | DuckDB StorageExtension + Catalog |
| Catalog | Server-side (GetTables, etc.) |
Extension-side (DuckDB catalog entries) |
| Execution | Full query on server | Hybrid — split across local and remote |
| Protocol surface | ~15 RPCs | 2 RPCs |
| Plan format | SQL only | SQL (M2), structured plan IR (M3) |
| Optimizer | Client-side, unaware | DuckDB optimizer sees remote tables natively |
Acknowledgments
OpenDuck's architecture draws heavily from MotherDuck's published work on differential storage, dual execution, and cloud-native DuckDB. Credit to the MotherDuck team for pioneering these ideas.
License
MIT