Solving the Dual Write Problem Without Losing Data

Distributed systems fail in uncomfortable ways.

Sometimes the database commit succeeds — but the Kafka publish fails.

Sometimes the event is published — but the transaction rolls back.

And sometimes everything looks successful… until downstream systems realize data is missing.

This is the dual write problem.

If your Go microservice:

writes to a database
publishes events
triggers async workflows
integrates with Kafka/RabbitMQ/NATS

then you are already dealing with it — whether you realize it or not.

This article explores how the Outbox Pattern solves this problem safely in production Go systems.

1. The Dual Write Problem

Consider a typical order flow:

HTTP Request
    ↓
Save order to DB
    ↓
Publish "OrderCreated" event

Naive implementation:

func CreateOrder(ctx context.Context, order Order) error {
    err := db.Insert(order)
    if err != nil {
        return err
    }

    err = kafka.Publish("order.created", order)
    if err != nil {
        return err
    }

    return nil
}

Looks harmless.

But what happens if:

DB insert succeeds
Kafka publish fails

Now:

order exists
no event emitted
downstream services never know

Your system is inconsistent.

2. Why Distributed Transactions Are Rarely the Answer

Some engineers try:

two-phase commit
distributed transactions
XA protocols

In practice:

operationally complex
poor performance
difficult to scale
unsupported by many systems

Modern systems usually prefer:

eventual consistency
reliable event delivery

This is where the Outbox Pattern shines.

3. Core Idea of the Outbox Pattern

Instead of:

DB write
+
Kafka publish

Do:

DB write
+
Insert event into outbox table

inside the SAME transaction.

Then:

background worker publishes events later

Now:

either both persist
or neither persists

Atomicity restored.

4. Outbox Table Design

Typical schema:

CREATE TABLE outbox_events (
    id UUID PRIMARY KEY,
    event_type TEXT NOT NULL,
    payload JSONB NOT NULL,
    created_at TIMESTAMP NOT NULL,
    processed_at TIMESTAMP,
    retries INT DEFAULT 0
);

Key fields:

payload
processed status
retry count
timestamps

This table becomes a durable event queue.

5. Writing to the Outbox (Go Example)

Inside transaction:

func CreateOrder(ctx context.Context, db *sql.DB, order Order) error {
    tx, err := db.BeginTx(ctx, nil)
    if err != nil {
        return err
    }

    defer tx.Rollback()

    _, err = tx.ExecContext(ctx,
        `INSERT INTO orders(id, amount) VALUES($1, $2)`,
        order.ID,
        order.Amount,
    )
    if err != nil {
        return err
    }

    payload, _ := json.Marshal(order)

    _, err = tx.ExecContext(ctx,
        `INSERT INTO outbox_events(id, event_type, payload, created_at)
         VALUES($1, $2, $3, NOW())`,
        uuid.New(),
        "order.created",
        payload,
    )
    if err != nil {
        return err
    }

    return tx.Commit()
}

Now:

order + event persist atomically

No dual write inconsistency.

6. Background Publisher Worker

Separate worker:

func StartOutboxPublisher(ctx context.Context, db *sql.DB) {
    ticker := time.NewTicker(2 * time.Second)

    for {
        select {
        case <-ctx.Done():
            return
        case <-ticker.C:
            publishPendingEvents(ctx, db)
        }
    }
}

7. Publishing Pending Events

func publishPendingEvents(ctx context.Context, db *sql.DB) {
    rows, err := db.QueryContext(ctx,
        `SELECT id, event_type, payload
         FROM outbox_events
         WHERE processed_at IS NULL
         LIMIT 100`)
    if err != nil {
        return
    }

    defer rows.Close()

    for rows.Next() {
        var (
            id        string
            eventType string
            payload   []byte
        )

        rows.Scan(&id, &eventType, &payload)

        err := kafka.Publish(eventType, payload)
        if err != nil {
            continue
        }

        _, _ = db.ExecContext(ctx,
            `UPDATE outbox_events
             SET processed_at = NOW()
             WHERE id = $1`,
            id,
        )
    }
}

Now failures become recoverable:

if Kafka fails → retry later
event never lost

8. The Hidden Problem: Duplicate Delivery

Outbox guarantees:

at-least-once delivery

Not:

exactly once

This means:

consumer may receive duplicates

Consumers MUST be idempotent.

This connects directly to:

retries
idempotency keys
distributed consistency

9. Handling Retries Properly

Never retry infinitely without control.

Track retries:

retries INT DEFAULT 0

Update:

UPDATE outbox_events
SET retries = retries + 1

Eventually:

dead-letter queue
manual inspection
alerting

10. Polling vs CDC (Change Data Capture)

Simple approach:

polling outbox table

Advanced approach:

Debezium / WAL streaming
CDC-based event publishing

Tradeoff:

Polling	CDC
Simple	Complex
Easier ops	Higher throughput
Slight latency	Near real-time

Most systems should start with polling.

11. Concurrency Pitfall: Multiple Workers

If multiple publisher instances run:

Two workers may publish same event.

Solution:

row locking

Example:

SELECT *
FROM outbox_events
WHERE processed_at IS NULL
FOR UPDATE SKIP LOCKED
LIMIT 100

This is critical in Kubernetes deployments.

12. Observability Matters

Track:

pending outbox size
retry count
oldest unprocessed event
publish latency
dead-letter count

Danger signal:

growing outbox table

This means downstream systems are unhealthy.

13. Real Production Failure Story

Classic outage pattern:

Kafka degraded
API kept accepting writes
events silently failed
downstream inventory never updated

Without outbox:

permanent inconsistency

After outbox:

events queued safely
Kafka recovered later
system healed automatically

This is resilience.

14. Production Lessons

The outbox pattern teaches an important engineering truth:

Reliability is not preventing failure.

It’s surviving failure without losing correctness.

Distributed systems WILL:

retry
duplicate
reorder
partially fail

Your architecture must expect this.

Final Thoughts

The Outbox Pattern is one of the most important patterns in modern backend engineering.

It solves:

dual write inconsistency
event loss
partial failures

But it also forces you to think carefully about:

idempotency
retries
observability
operational recovery

Reliable distributed systems are not built by hoping failures won’t happen.

They are built by assuming they absolutely will.

Outbox Pattern in Go Microservices

Solving the Dual Write Problem Without Losing Data

1. The Dual Write Problem

2. Why Distributed Transactions Are Rarely the Answer

3. Core Idea of the Outbox Pattern

4. Outbox Table Design

5. Writing to the Outbox (Go Example)

6. Background Publisher Worker

7. Publishing Pending Events

8. The Hidden Problem: Duplicate Delivery

9. Handling Retries Properly

10. Polling vs CDC (Change Data Capture)

11. Concurrency Pitfall: Multiple Workers

12. Observability Matters

13. Real Production Failure Story

14. Production Lessons

Final Thoughts