How We Built a Sub-Millisecond Crypto Market Data Feed in C++

Every crypto exchange speaks its own dialect. Binance sends depthUpdate messages with "b" and "a" arrays. Coinbase wraps updates in a channel/type envelope. OKX gzip-compresses its WebSocket frames. Bybit uses a different snapshot synchronization protocol than any of them. If you want to build anything that consumes order book data from multiple exchanges, you are stuck writing and maintaining a bespoke parser for each one, each with its own reconnect logic, snapshot sync state machine, and symbol naming convention.

We built Microverse to solve this problem: a single C++ pipeline that normalizes real-time order book data from 20 exchanges into a uniform stream, and serves it over a free WebSocket API. This article walks through the architecture, the hard engineering problems we hit, and the techniques we used to keep end-to-end latency under one millisecond.

The Pipeline at a Glance

The data path from exchange to client has seven stages:

Exchange WS → WS Driver → Parser → Book → SHM Ring → mdf_server → Gateway → Client

Each exchange runs as its own handler process. The handler connects to the exchange, parses messages, maintains a local order book, and writes normalized updates into a shared-memory ring buffer. A central mdf_server process reads from all 20 ring buffers and distributes updates to downstream consumers: a web dashboard, a WebSocket gateway for external clients, and internal analytics. No message broker. No serialization framework. Just lock-free shared memory and TCP.

Let us walk through each stage.

Stage 1: WebSocket Driver

Each handler spawns an SSL WebSocket connection to its exchange. The driver (mcast_websocket.cpp) handles the full lifecycle: TLS handshake, WebSocket frame decoding, ping/pong keepalives, and transparent decompression for exchanges that gzip their payloads (HTX, OKX, and others).

When a complete text frame arrives, the driver writes the raw JSON into a buffer and tags it with a port number: 1 for incremental depth updates, 2 for snapshots. Every message is also written to a binary capture file (24-byte header plus JSON payload) so we can replay production traffic through the pipeline deterministically during development.

The driver also runs a separate snapshot thread. On initial subscription or when a sequence gap is detected, it makes an HTTPS REST call to fetch a full book snapshot and pushes it into an internal message queue, which the main recv() loop picks up on the next iteration.

Stage 2: Parser (simdjson)

Each exchange has a dedicated parser class (e.g., BinanceParser, CoinbaseParser, KrakenParser) that implements a common interface: processPacket(buffer, len, channel). The parser's job is to extract price level updates from exchange-specific JSON and translate them into uniform levelAdd / levelDelete calls on the book.

We use simdjson for JSON parsing. It processes JSON at gigabytes per second using SIMD instructions, which matters when you are parsing hundreds of thousands of messages per second across all exchanges. One critical lesson we learned the hard way: simdjson's on-demand parser modifies the buffer in-place during string unescaping. The escaped bytes \"bids\" get rewritten to bids\0..., destroying the structural quote characters. A second iterate() call over the same buffer silently returns zero results. Every parser must do exactly one parse pass per buffer.

The parser also handles the trickiest part of exchange integration: price normalization. All prices are converted to fixed-point integers with 8 decimal places. The string "98500.12" becomes the integer 9850012000000. This eliminates floating-point comparison issues entirely and keeps the book operations branch-free.

Stage 3: Snapshot Synchronization

Every exchange uses a variant of the same pattern: subscribe to a WebSocket stream of incremental updates, fetch a REST snapshot to establish a baseline, then apply only those incremental updates whose sequence numbers come after the snapshot.

The devil is in the details. Binance gives you an updateId on both snapshots and deltas; you buffer deltas until the snapshot arrives, discard any with updateId <= snapshot.lastUpdateId, and apply the rest in order. If you detect a gap (updateId != lastUpdateId + 1), you need to re-snapshot. OKX uses a checksum field you can validate against. Coinbase has a completely different sequencing model.

Each parser maintains per-symbol sync state:

struct SymbolState {
    bool snapshot_synced;
    bool needs_resnapshot;
    uint64_t seq_last_applied;
    std::vector<PendingUpdate> pending_updates;
};

When the parser detects a gap or stale data, it sets needs_resnapshot = true. The handler's main loop polls for this via popResnapshot() and triggers a new REST snapshot fetch. Until the snapshot arrives and sync is re-established, all incremental updates for that symbol are silently dropped. This is a deliberate design choice: we would rather show stale data for a fraction of a second than apply updates to a book that is out of sync, which would produce silently wrong prices.

Stage 4: The Order Book

The book (mdf_book.h) stores price levels in a sorted linked list per side (bid/ask). When a parser calls levelAdd, the book finds or inserts the price level, updates its quantity, and calls a virtual priceLevelChanged() callback. When levelDelete is called (quantity goes to zero), the level is removed from the list and the same callback fires.

The linked list uses a slab allocator (SlabbedVector) rather than std::vector to avoid pointer invalidation on growth. Slabs are allocated in fixed-size chunks (128 elements) and never freed until the container is destroyed. This gives us O(1) allocation, zero reallocation copies, and stable pointers.

Stage 5: Shared-Memory Ring Buffers

This is where the latency story gets interesting. Each handler writes to a shared-memory ring buffer mapped at /dev/shm/<exchange>_response. The ring is a single-producer, single-consumer (SPSC) lock-free queue implemented with two cache-aligned atomic counters:

Offset 0:    [header]
Offset 64:   atomic<long> r   // reader position  (CACHE_ALIGNED)
Offset 128:  atomic<long> w   // writer position   (CACHE_ALIGNED)
Offset 256+: [data: variable-length MDFMsg records]

The reader and writer positions are on separate cache lines (64-byte aligned) to eliminate false sharing. The writer advances with store(release), the reader reads with load(acquire). There are no locks, no syscalls, and no kernel involvement in the hot path. A Linux futex is used only when the reader has no data and wants to sleep rather than spin.

Messages are variable-length and written directly as packed C structs. A price level change is 48 bytes:

MDFPriceLevelChangeMsg {
    uint16_t  size;       // message size
    uint8_t   type;       // 42
    secid_t   secid;      // symbol ID
    side_t    side;       // BID=0, ASK=1
    price_t   price;      // fixed-point, 8 decimals
    int64_t   shares;     // quantity
    int       num_orders; // order count at level
    timestamp_t timestamp;// nanoseconds
};

No serialization, no deserialization. The mdf_server reads the struct directly out of shared memory. This is true zero-copy: the data written by the handler is the exact byte layout read by the server.

The handler also batches writes to amortize the cost of the atomic store. It accumulates messages in a local buffer and flushes to the shared-memory ring when a threshold is reached or the main loop goes idle.

Stage 6: mdf_server (Aggregator)

The mdf_server process attaches to all 20 handler ring buffers and runs a tight poll loop:

for each ring:
    while ring has data:
        read MDFMsg from ring
        route to subscribed clients via TCP

It maintains a subscription table mapping symbol IDs to connected clients. When a web dashboard or gateway subscribes to "binance:BTCUSDT", the server writes a subscription request into the handler's request ring (/dev/shm/binance_request). The handler picks it up, fetches a snapshot, builds the initial book, and writes a full MDFRefreshMsg (containing all bid and ask levels) back through the response ring. From that point on, incremental updates flow automatically.

The server also handles heartbeats, connection management, and a subscription protocol that lets clients dynamically add and remove symbols.

Stage 7: WebSocket Gateway

The gateway (mdf_gateway.cpp) connects to mdf_server as an internal TCP client, maintains its own in-memory copy of every book it subscribes to, and serves external clients over WebSocket with JSON payloads. It supports per-exchange subscriptions, consolidated cross-exchange views, and top-of-book snapshots.

The gateway includes an embedded HTML test page, so you can point a browser at it and immediately see live order books rendered with a cyberpunk-themed dashboard. But more practically, you can connect with any WebSocket client and get structured JSON updates.

Performance Characteristics

The pipeline achieves sub-millisecond end-to-end latency from exchange WebSocket receipt to client delivery. Here is where the time goes:

SSL read + WebSocket decode: ~50-100us
simdjson parse + book update: ~10-30us
SHM ring write + read: ~1-5us
TCP send to gateway/viewer: ~50-200us

The key design decisions that keep latency low:

No serialization layer. Messages are packed C structs written directly to shared memory and read directly by the consumer. No protobuf, no flatbuffers, no JSON encoding between internal components.
SPSC lock-free rings. The only synchronization primitive in the hot path is a pair of atomic load/store operations on cache-aligned counters. No mutexes, no condition variables.
Slab allocation. The order book never calls malloc or free in the hot path. Price levels are allocated from pre-allocated slabs that grow but never shrink.
Fixed-point arithmetic. All prices and quantities are 64-bit integers. No floating-point comparison, no rounding issues, no epsilon checks.
Per-exchange process isolation. Each handler is a separate OS process. A crash or hang in the Kraken parser does not affect Binance. The mdf_server simply stops seeing updates on that ring until the watchdog restarts the handler.

20 Exchanges, One API

The system currently normalizes data from: Ascendex, Binance, BingX, Bitfinex, Bitget, Bitmart, Bybit, Coinbase, CoinEx, Crypto.com, Gate.io, Gemini, HTX, Kraken, KuCoin, LBank, MEXC, OKX, Phemex, and Upbit. Each required writing a dedicated parser, figuring out its snapshot sync protocol, handling its compression scheme, and mapping its symbol naming convention to our normalized format.

Adding a new exchange typically takes a day of work: study the WebSocket API docs, write the parser class, add snapshot sync logic, test against captures, and deploy. The driver, book, ring buffer, and distribution layers are all reusable.

Try It

The WebSocket API is free and requires no authentication. Connect and subscribe to any symbol across any supported exchange:

const ws = new WebSocket('wss://api.microversesystems.com');

ws.onopen = () => {
  // Subscribe to BTC/USDT books from Binance and Coinbase
  ws.send(JSON.stringify({
    op: 'subscribe',
    symbols: ['binance:BTCUSDT', 'coinbase:BTC-USD']
  }));
};

ws.onmessage = (event) => {
  const msg = JSON.parse(event.data);
  if (msg.type === 'book') {
    console.log(`${msg.exchange}:${msg.symbol}`);
    console.log(`  Best bid: ${msg.bids[0][0]} @ ${msg.bids[0][1]}`);
    console.log(`  Best ask: ${msg.asks[0][0]} @ ${msg.asks[0][1]}`);
  }
};

You can see live order books from all 20 exchanges on the dashboard, read the API documentation, or learn more at microversesystems.com.

Wrapping Up

Building a low-latency market data feed is not about any single optimization. It is about eliminating unnecessary work at every stage: no serialization overhead, no lock contention, no memory allocation in the hot path, no floating-point arithmetic. Each decision compounds.

The hardest part was not the C++ or the performance work. It was the exchange integration: 20 different WebSocket APIs, 20 different snapshot sync protocols, 20 different ways of naming BTC/USDT. That is the real engineering work, and it is the reason we built this as a service rather than a library. You should not have to reverse-engineer Phemex's sequence number semantics just to get a clean order book.

If you are building trading systems, analytics, or dashboards that need real-time crypto data, give the API a try. It is free, it is fast, and it covers 20 exchanges with a single WebSocket connection.