event sourcing 은 어떻게 동작할까?

전통 DB: 현재 상태만 저장. user.balance = 100. event sourcing: 모든 변경을 event 로 저장. (deposit 50), (withdraw 30), ... 현재 상태는 event 들의 fold. 무거운 추상화 — 그러나 audit trail · time-travel 디버깅 · 일관성 같은 강력한 속성 제공. 이 가이드는 메커니즘과 "언제 쓸 만한가" 를 정리한다.

전통 CRUD vs Event Sourcing

전통 CRUD:
  users 테이블의 row 가 현재 상태
  UPDATE users SET balance = 100 WHERE id = 42
  → 이전 값은 사라짐

Event Sourcing:
  events 테이블에 모든 변경 append-only
  INSERT events (id, type, data, ts) VALUES (
    1, 'AccountOpened', {user: 42, initial: 0}, t0)
  INSERT events (id, type, data, ts) VALUES (
    2, 'Deposited',     {user: 42, amount: 50}, t1)
  INSERT events (id, type, data, ts) VALUES (
    3, 'Withdrawn',     {user: 42, amount: 30}, t2)

  → 현재 balance = fold(events) = 0 + 50 - 30 = 20
  → 모든 변경 영구 보존

핵심 아이디어 — Event 가 source of truth

전통:
  state = source of truth
  event 는 derived ("UPDATE 발생했다" 가 log 또는 audit table 에 부산물)

Event Sourcing:
  event = source of truth
  state 는 derived (event 들을 처음부터 replay 해서 만든 캐시)

→ 시점 어디에서나 state 재구성 가능 — "어제 자정의 balance 는?"
→ 모든 변경의 "왜" 가 event type 으로 남음 (Withdrawn vs Refunded vs Fee)

Projection — event 를 query 가능 모양으로

사용자가 balance 조회할 때 매번 모든 event replay 는 비싸짐
→ projection: event 를 미리 fold 해서 별도 테이블에 보관

events                        →  projection: user_balances
- AccountOpened(u=42, 0)            user_id | balance
- Deposited(u=42, 50)               --------|-------
- Withdrawn(u=42, 30)               42      | 20
- Deposited(u=42, 100)              ...     | ...
- ...                               (event 처리 후 갱신)

projection 은 그냥 cache — 깨지면 events 에서 재생성 가능.
같은 events 에서 여러 projection 가능 (balance, transaction history,
monthly stats 등)

CQRS — Command and Query Responsibility Segregation

Event sourcing 과 자주 짝. command (write) 와 query (read) 의 모델을 분리:

           Command side                Query side
           ─────────────                ──────────
Client →   POST /deposit               GET /balance
           ↓                           ↓
           validate                    read projection
           ↓                           (예: user_balances)
           write event                 ↓
           ↓                           return current balance
           (events 테이블)              ↑
           ↓                           projection table
           projector 가 받아 갱신  →→→→→→→

Command 모델: complex domain logic, validation
Query 모델: 빠른 read 에 최적화된 projection (각 use case 별로)

장점:
- command 와 query 가 독립 scale
- query 가 여러 projection 으로 다양화 가능 (graph, search index, cache)
단점:
- write → projection 의 lag (보통 ms)
- 두 모델 학습 곡선 큼

Snapshot — replay 비용 줄이기

user 가 10년간 100,000 event 누적
→ 매 balance 조회마다 100K event replay = 느림

해결: snapshot
- 주기적 (예: 1000 event 마다) 또는 시간 기반 (1일)
- 그 시점의 state 를 별도 테이블에 dump
- 이후 read 시 → 최신 snapshot + 그 이후 event 만 replay

balance 조회 (event 100K, snapshot @ 99K):
- snapshot @ 99K 읽기 (1 row)
- 99K 부터의 event 1K 개 replay
- 빠름

snapshot 도 cache — 잃어버려도 events 에서 재생성.

Outbox Pattern — distributed event 의 atomicity

문제:
  command 처리에서 두 단계:
  1. event 를 DB 에 박기
  2. event 를 message broker (Kafka) 에 publish

  1 만 성공하고 2 fail 하면 → consumer 가 event 못 받음
  반대도 같은 문제 (publish 후 DB write fail = 가짜 event)

해결: outbox 패턴
  1. DB transaction 안에서:
     - events 테이블 INSERT
     - outbox 테이블 INSERT (같은 transaction)
  2. 별도 process (relay) 가 outbox 의 미전송 row 를 polling
  3. Kafka 에 publish → 성공 시 outbox row 삭제 또는 status = sent

→ DB transaction atomicity 가 두 단계의 atomicity 자동 보장.
   Kafka publish fail 도 retry 가능 (outbox 에 남아있음).

Replay — time travel 디버깅

사용 예:
- "어제 자정의 모든 user balance 는?" → 그 시점까지 event replay
- "이 user 의 balance 가 왜 음수가 됐는가?" → event 시퀀스 확인
- 새 projection 추가 — 처음부터 event replay 해서 backfill
- 버그 발견 → projection 만 reset + replay 로 수정 (event 는 immutable)

전통 CRUD 에서는 이게 거의 불가능 — UPDATE 가 이전 값 덮어쓰기.

Schema Evolution — 가장 큰 함정

Event 는 immutable. 그러나 시간 지나면 schema 변경 필요:

v1 event: { type: "Deposited", amount: 50 }
v2 event: { type: "Deposited", amount: 50, currency: "USD" }
v3 event: { type: "Deposited", money: {value: 50, currency: "USD"} }

옛 v1 event 를 어떻게 처리?

전략 1 — 절대 schema 안 변경 (불가능)
전략 2 — upcast: 옛 event 를 읽을 때 새 schema 로 변환
  read: v1 → upcaster → v3 schema 로 사용
전략 3 — projection 마다 schema 처리 분기 (case statement)
전략 4 — 새 event type 으로 별도 처리 (DepositedV2)
전략 5 — 전체 migration (옛 event 를 새 schema 로 변환 + 저장) — 위험

→ schema evolution 이 가장 큰 운영 부담. 처음부터 잘 설계해야.

언제 쓰나 / 언제 안 쓰나

쓴다 (event sourcing 이 적합)

audit trail 이 핵심 — 금융, 의료, 법률
복잡한 도메인 logic — DDD 와 자연스러운 짝
time travel 디버깅 필수 — "왜 이 state 가 됐나" 를 자주 묻는 시스템
multiple read models 필요 — graph + search + analytics + cache 등

안 쓴다 (overkill)

단순 CRUD — blog post, comment, settings 등
학습 비용 가치 없는 팀 — DDD / event 모델링 익숙도 낮으면 부채.
strong consistency 가 모든 query 에 필요 — projection 의 lag 가 용납 안 되는 영역
schema 가 빨리 바뀜 — schema evolution 비용 누적

흔한 함정

모든 도메인에 event sourcing — bounded context 단위로 선택. 단순 CRUD 도 같이 쓸 수 있음.
event 의 "왜" 누락 — UpdatedField 같은 generic event 는 audit 가치 없음. PriceCorrected / Refunded 같은 의미 있는 type 으로.
projection rebuild 가 너무 느림 — millions of event 의 처음부터 replay 가 시간 단위. 주기적 snapshot 으로 완화.
eventually consistent 한 query 모델 이해 부족 — UI 가 "방금 만든 데이터 즉시 노출" 가정하면 깨짐. optimistic UI 또는 폴링 필요.
storage 폭증 — 모든 event 영구 보존 = TB 단위. archival 전략 또는 snapshot + 옛 event 압축.

마무리

Event sourcing 은 강력하지만 무겁다. audit · time-travel · multiple projection 의 가치가 학습·schema 비용 보다 클 때만 정답. 단순 도메인 엔 over-engineering.

실용 — bounded context 단위로 선택. 금융·결제·주문 같은 audit-heavy 영역만 event sourcing, 나머지는 전통 CRUD. CQRS 는 event sourcing 없이도 가능 (read/write 분리만 으로도).