How Event Sourcing Actually Works

Traditional DBs store current state — user.balance = 100. Event sourcing stores every change as an event — (deposit 50), (withdraw 30)... Current state is the fold of events. A heavy abstraction — but powerful: audit trail, time-travel debugging, multiple projections. This guide covers the mechanics and "when is it worth it".

Traditional CRUD vs Event Sourcing

Traditional CRUD:
  A row in the users table = current state
  UPDATE users SET balance = 100 WHERE id = 42
  → Previous value is gone

Event Sourcing:
  Append-only events table for every change
  INSERT events (id, type, data, ts) VALUES (
    1, 'AccountOpened', {user: 42, initial: 0}, t0)
  INSERT events (id, type, data, ts) VALUES (
    2, 'Deposited',     {user: 42, amount: 50}, t1)
  INSERT events (id, type, data, ts) VALUES (
    3, 'Withdrawn',     {user: 42, amount: 30}, t2)

  → Current balance = fold(events) = 0 + 50 - 30 = 20
  → Every change preserved

Core Idea — Events Are Source of Truth

Traditional:
  state = source of truth
  events are derived (an "UPDATE happened" log or audit table as byproduct)

Event Sourcing:
  events = source of truth
  state is derived (a cache built by replaying events from the start)

→ Reconstruct state at any point in time — "what was the balance at midnight?"
→ Every change carries its "why" via event type (Withdrawn vs Refunded vs Fee)

Projection — Events Into Queryable Shape

Replaying every event on every balance check is too expensive
→ Projection: pre-fold events into a separate table

events                        →  projection: user_balances
- AccountOpened(u=42, 0)            user_id | balance
- Deposited(u=42, 50)               --------|-------
- Withdrawn(u=42, 30)               42      | 20
- Deposited(u=42, 100)              ...     | ...
- ...                               (updated as events come in)

A projection is just cache — if corrupted, rebuild from events.
Multiple projections from the same events possible (balance,
transaction history, monthly stats etc.)

CQRS — Command and Query Responsibility Segregation

Often paired with event sourcing. Separate models for command (write) and query (read):

           Command side                Query side
           ─────────────                ──────────
Client →   POST /deposit               GET /balance
           ↓                           ↓
           validate                    read projection
           ↓                           (e.g. user_balances)
           write event                 ↓
           ↓                           return current balance
           (events table)              ↑
           ↓                           projection table
           projector consumes & updates →→→→→→→

Command model: complex domain logic, validation
Query model: per-use-case projections optimized for read

Pros:
- Independent scaling of command and query
- Many projections (graph, search index, cache)
Cons:
- write → projection lag (usually ms)
- Two models = steep learning curve

Snapshots — Avoiding Full Replay

A 10-year user accumulates 100,000 events
→ Replaying 100K events per balance read = slow

Fix: snapshots
- Periodically (e.g. every 1000 events) or time-based (1 day)
- Dump the state at that point into a snapshot table
- Subsequent reads: latest snapshot + only newer events

Balance read (100K events, snapshot @ 99K):
- Read snapshot @ 99K (1 row)
- Replay 1K events from 99K
- Fast

Snapshots are also cache — rebuildable from events if lost.

Outbox Pattern — Atomicity Across Systems

Problem:
  Command handler does two things:
  1. Insert event into DB
  2. Publish event to a message broker (Kafka)

  If 1 succeeds but 2 fails → consumer doesn't receive the event.
  Reverse is just as bad (publish then DB fail = phantom event).

Fix: outbox pattern
  1. Inside a DB transaction:
     - INSERT into events
     - INSERT into outbox (same transaction)
  2. Separate process (relay) polls outbox for unsent rows
  3. Publish to Kafka → on success, delete or mark sent

→ DB transaction atomicity automatically guarantees both steps.
   Kafka publish failures are retried (outbox still holds it).

Replay — Time-Travel Debugging

Use cases:
- "What were all user balances at midnight yesterday?"
  → replay events to that point
- "Why did this user's balance go negative?"
  → inspect the event sequence
- New projection — replay from the start to backfill
- Bug found → reset projection + replay to fix (events stay immutable)

In traditional CRUD this is mostly impossible — UPDATE overwrites prior values.

Schema Evolution — The Biggest Trap

Events are immutable. But over time the schema needs to change:

v1 event: { type: "Deposited", amount: 50 }
v2 event: { type: "Deposited", amount: 50, currency: "USD" }
v3 event: { type: "Deposited", money: {value: 50, currency: "USD"} }

How do you handle the old v1 events?

Strategy 1 — never change schema (impossible)
Strategy 2 — upcast: transform old events to new schema on read
  read: v1 → upcaster → v3-shape used downstream
Strategy 3 — branch projection logic per version (case statements)
Strategy 4 — new event type (DepositedV2) handled separately
Strategy 5 — full migration (rewrite old events) — risky

→ Schema evolution is the biggest operating burden. Design well early.

When to Use / When Not To

Use (event sourcing fits)

Audit trail is core — finance, healthcare, legal
Complex domain logic — natural pair with DDD
Time-travel debugging required — systems where "how did we get here" matters
Multiple read models needed — graph + search + analytics + cache etc.

Don't (overkill)

Simple CRUD — blog posts, comments, settings
Teams without the learning bandwidth — DDD / event modeling unfamiliarity = debt
Every query needs strong consistency — projection lag intolerable
Schema changes frequently — evolution costs stack up

Common Pitfalls

Event sourcing everywhere — pick per bounded context. Mix with CRUD elsewhere.
Missing the "why" in events — generic UpdatedField events have no audit value. Use meaningful types like PriceCorrected / Refunded.
Projection rebuild too slow — replaying millions of events from the start takes hours. Periodic snapshots help.
Underestimating the eventually-consistent query model — UIs that assume "just-created data is immediately visible" break. Optimistic UI or polling needed.
Storage explosion — preserving every event = TB-scale. Need an archival strategy or snapshot + compress older events.

Wrap-up

Event sourcing is powerful but heavy. It's the right answer only when audit / time-travel / multiple projections outweigh the learning + schema cost. Over-engineering for simple domains.

Practical: pick per bounded context. Audit-heavy areas like payments / orders → event sourcing; the rest → CRUD. CQRS works without event sourcing too (just split read/write).