How Distributed Tracing Actually Works

In a microservice environment, a request traverses 10 services. p99 latency 2s — where? Traces answer that. This guide covers spans / context propagation / W3C Trace Context / sampling — and the "1% sampling drops the slowest requests" trap (solved by tail sampling).

Span — The Trace Building Block

A span:
  - one operation, start → end
  - metadata: name, start_time, duration, attributes (tags)
  - parent span_id (optional)
  - trace_id (root ID for the whole trace)

trace = a tree of spans with the same trace_id

Example:
trace_id = abc123
  span_id = root (POST /login)
  ├ span_id = s1 (parent=root, "validate input")
  ├ span_id = s2 (parent=root, "DB query")
  │ └ span_id = s2a (parent=s2, "INDEX scan")
  └ span_id = s3 (parent=root, "Redis set")

Trace Context Propagation — Stitching Services

Service A calls service B. For B's spans to live in A's trace, trace_id + parent span_id must travel in HTTP headers.

W3C Trace Context (standard):

HTTP request from A to B:
  traceparent: 00-abc123-s5-01
                │  │     │  │
                │  │     │  └─ trace flags (01 = sampled)
                │  │     └──── parent span_id (A's current span)
                │  └─────────── trace_id
                └─────────────── version

When B starts a new span:
  - reuse trace_id (abc123)
  - generate new span_id
  - parent_id = s5 (from header)

→ The whole trace can be reconstructed as one tree.

Instrumentation — Creating Spans

Manual

// OpenTelemetry SDK (Node.js example)
const tracer = trace.getTracer("my-service");

async function handleLogin(req) {
  const span = tracer.startSpan("handle-login");
  span.setAttribute("user_id", req.userId);
  try {
    const user = await db.users.find(req.userId);
    span.setAttribute("user.found", !!user);
    return user;
  } catch (e) {
    span.recordException(e);
    span.setStatus({code: SpanStatusCode.ERROR});
    throw e;
  } finally {
    span.end();
  }
}

Auto-instrumentation

OpenTelemetry auto-instrumentation libraries hook popular frameworks (Express, FastAPI, Spring Boot, gRPC client) and create spans automatically. Start with auto; add manual spans on hot paths.

Sampling — Don't Store Every Trace

Tracing every request is cost-prohibitive. Sampling is mandatory.

Head Sampling — Decide at the Start

When the first service creates a trace_id:
  random < 0.01 ? sampled=true : sampled=false

Pros: simple, consistent across services
Cons: drops the slowest requests 99% of the time too

Tail Sampling — Decide After the Trace Ends

Buffer all spans → evaluate when the trace finishes:
  - duration > 1s ? → keep
  - any error? → keep
  - normal + fast → keep 1%

Pros: 100% of slow / error traces preserved (huge debugging value)
Cons: needs the whole trace before deciding → buffer + slight latency
Tool: OpenTelemetry Collector's tail_sampling processor

Real Backends

Tool	Type	Strength
Jaeger	OSS (CNCF)	Self-host, Cassandra/Elasticsearch backend
Tempo (Grafana)	OSS	S3/GCS backend — cheap, Grafana integration
Zipkin	OSS	Oldest (2012, Twitter), simple
Honeycomb	SaaS	High cardinality + BubbleUp (auto anomaly)
Lightstep (now ServiceNow)	SaaS	Distributed systems focus, huge trace volume
DataDog APM	SaaS	Metrics / logs unified, strong marketing
AWS X-Ray	Cloud	Auto-integrated with AWS services

Common Pitfalls

Missed context propagation — moving across async tasks / queues without carrying the trace context → broken tree. Wrappers required.
Span attribute explosion — high-cardinality attributes (user_id etc.) strain backend indexing. Choose deliberately.
1% head sampling only — risks dropping the slowest / error traces. Use tail sampling or "errors always sampled".
SDK overhead — instrumentation itself eats 1-5% CPU. Profile + manual only on hot paths.
Underestimating trace volume — 1M requests × 100 spans × KB = TB/day. Retention policy required.

Wrap-up

Distributed tracing is fundamentally span trees + context propagation. W3C Trace Context standardizes it for cross-vendor compatibility. OpenTelemetry provides a single instrumentation SDK.

Practical: you can't store every trace — sampling is mandatory. Start with head sampling; consider tail sampling for production debugging value. Always keep errors.