testing pyramid 는 어떻게 동작할까?

Mike Cohn 의 2009 년 책에서 시작된 testing pyramid — unit 70 / integration 20 / E2E 10. 그 비율이 절대 법칙이 아니라 속도·비용·신뢰도의 trade-off 표현. 이 가이드는 pyramid 의 각 layer 가 무엇을 검증하고, modern 대안 (testing trophy) 이 등장한 이유를 정리한다.

Pyramid 의 의도 — 속도·비용 piramid

         /\
        / E\  ← E2E (느림 ms~s 단위, 비쌈, 깨지기 쉬움)
       / 2  \
      /  E   \
     /────────\
    / Integ.   \  ← Integration (DB·service 통합, 중간)
   /────────────\
  /              \  ← Unit (빠름 μs 단위, 싸고 정확)
 /     Unit       \
/──────────────────\

비율 70/20/10 = 다음에 답:
"빠르고 싼 test 를 많이, 느리고 비싼 test 를 적게"

→ feedback loop 빠름, CI 비용 적음, debug 쉬움 (unit fail 이 더 정확).

Layer 별 의미

Unit Test

def add(a, b):
    return a + b

def test_add():
    assert add(2, 3) == 5
    assert add(-1, 1) == 0

특징:
- 한 함수·class 의 행동만
- 외부 의존 (DB, network, filesystem) 없음 (있으면 mock)
- ms 미만 (보통 μs)
- 1 file 변경 = 1-10 unit test 영향

장점: 빠른 feedback, 정확한 실패 위치, 안정적 (env 영향 X)
단점: 통합 동작 검증 안 됨, mock-heavy 시 "구현 결합"

Integration Test

def test_user_signup_saves_to_db():
    user = create_user("alice", "pw")  # 실제 DB
    found = db.users.find_by_name("alice")
    assert found.id == user.id

특징:
- 여러 module / external system 통합
- 실제 DB (또는 testcontainer), filesystem, in-memory broker
- 10ms-1s
- 통합 boundary 검증 (SQL query, ORM mapping, transaction)

장점: real 통합 검증, mock 안 됨
단점: 느림 (CI 시간 ↑), env 의존 (DB schema 등)

E2E Test

async def test_signup_flow():
    page = await browser.new_page()
    await page.goto("/signup")
    await page.fill('input[name="email"]', "a@b.com")
    await page.click('button[type="submit"]')
    await page.wait_for_selector(".welcome")
    assert "Welcome" in await page.text_content(".welcome")

특징:
- 사용자 관점 전체 흐름
- 진짜 브라우저 (Playwright, Cypress) + 진짜 backend + DB
- s 단위 (보통 5-30s 한 test)
- frontend + backend + 인프라 모두 검증

장점: 사용자 경험 직접 검증
단점: 매우 느림, 매우 flaky (timing / network), 깨지기 쉬움 (selector 변경)

Ice Cream Cone — 안티 패턴

        ____________________
       \   E2E (대부분)       /
        \______/¯¯¯¯¯¯\______/
              \Integ./
              \Unit /
               \___/

증상:
- E2E test 100+, unit test 거의 없음
- CI 1 시간+
- 1 변경 = 5 E2E fail (어디서 진짜 깨졌는지 불명)
- "다 retry 하면 되겠지"

원인:
- "사용자 경험만 검증하면 되잖아" — 부분적 진실, 그러나 cost 폭발
- legacy / mock 어려운 codebase
- code 변경 빈도 < test 작성 빈도

방어: 새 feature 마다 unit 부터, E2E 는 critical path 만 (signup,
       checkout 등 핵심 user journey).

Testing Trophy — modern 대안 (Kent C. Dodds)

     ___
    |   |  ← E2E (소수, critical path)
   _|___|_
  |       |  ← Integration (가장 많이) ← 핵심
   \_____/
   |     |  ← Unit (논리·계산 중심)
    \___/
      |    ← Static (TypeScript, ESLint) ← base
     ___

vs Pyramid 의 차이:
- Integration 이 가장 굵음 — frontend 의 경우 component 의
  통합 동작 (state + render + interaction) 이 가장 가치 큼
- Static (typecheck, lint) 을 base 로 — modern toolchain 에서 무료에
  가까운 검증

배경:
- Pyramid 는 backend 중심 (2009). frontend 에서는 unit 너무 잘게
  쪼개면 구현 결합 ↑.
- React Testing Library, Vitest, Playwright Component Test —
  integration-style 테스트가 만들기 쉬워짐.
- "user 관점에서 test 쓰면 자연스럽게 integration shape"

어느 shape 이 옳은가

코드 type	권장 shape	이유
Pure algorithm (parser, math)	Unit-heavy (정상 pyramid)	외부 의존 적음, edge case 많음
Backend API service	Pyramid + integration 강화	DB · 외부 service 통합이 핵심
Frontend SPA	Trophy (integration-heavy)	component 통합 + user interaction
Microservice (consumer-provider)	Contract test 추가	service 간 boundary
Data pipeline	Unit + 작은 integration sample	큰 데이터 full E2E 어려움

흔한 함정

비율에 집착 — 70/20/10 은 가이드라인 아님, shape 의 idea. coverage 30% 라도 critical path 잘 cover 면 OK.
모든 함수에 unit test — getter / DTO mapping 같은 trivial 코드는 typecheck 만으로 충분.
mock-heavy unit test — 100% coverage 인데 실제 통합 안 됨 → integration test 적정량 필수.
E2E test 만 추가 — feedback loop 30 분, fail 시 debug 1 시간. 비싼 시그널.
flaky test retry — flaky 의 진짜 원인 안 잡으면 CI 신뢰 ↓ (flaky tests 가이드 참조).

마무리

Pyramid · Trophy 는 ratio rule 아닌 cost vs value 의 시각화. 코드베이스 type 에 맞게 shape 선택. 모든 조직이 같은 비율 X.

실용 시작 — typecheck + lint (free) base → unit (pure logic) + integration (boundary) 적정량 → E2E 는 critical user journey 만 (signup, checkout). flaky 한 E2E 는 즉시 fix 또는 quarantine.