In a microservice environment, a request traverses 10 services. p99 latency 2s — where? Traces answer that. This guide covers spans / context propagation / W3C Trace Context / sampling — and the "1% sampling drops the slowest requests" trap (solved by tail sampling).
Span — The Trace Building Block
A span:
- one operation, start → end
- metadata: name, start_time, duration, attributes (tags)
- parent span_id (optional)
- trace_id (root ID for the whole trace)
trace = a tree of spans with the same trace_id
Example:
trace_id = abc123
span_id = root (POST /login)
├ span_id = s1 (parent=root, "validate input")
├ span_id = s2 (parent=root, "DB query")
│ └ span_id = s2a (parent=s2, "INDEX scan")
└ span_id = s3 (parent=root, "Redis set")Trace Context Propagation — Stitching Services
Service A calls service B. For B's spans to live in A's trace, trace_id + parent span_id must travel in HTTP headers.
W3C Trace Context (standard):
HTTP request from A to B:
traceparent: 00-abc123-s5-01
│ │ │ │
│ │ │ └─ trace flags (01 = sampled)
│ │ └──── parent span_id (A's current span)
│ └─────────── trace_id
└─────────────── version
When B starts a new span:
- reuse trace_id (abc123)
- generate new span_id
- parent_id = s5 (from header)
→ The whole trace can be reconstructed as one tree.Instrumentation — Creating Spans
Manual
// OpenTelemetry SDK (Node.js example)
const tracer = trace.getTracer("my-service");
async function handleLogin(req) {
const span = tracer.startSpan("handle-login");
span.setAttribute("user_id", req.userId);
try {
const user = await db.users.find(req.userId);
span.setAttribute("user.found", !!user);
return user;
} catch (e) {
span.recordException(e);
span.setStatus({code: SpanStatusCode.ERROR});
throw e;
} finally {
span.end();
}
}Auto-instrumentation
OpenTelemetry auto-instrumentation libraries hook popular frameworks (Express, FastAPI, Spring Boot, gRPC client) and create spans automatically. Start with auto; add manual spans on hot paths.
Sampling — Don't Store Every Trace
Tracing every request is cost-prohibitive. Sampling is mandatory.
Head Sampling — Decide at the Start
When the first service creates a trace_id:
random < 0.01 ? sampled=true : sampled=false
Pros: simple, consistent across services
Cons: drops the slowest requests 99% of the time tooTail Sampling — Decide After the Trace Ends
Buffer all spans → evaluate when the trace finishes:
- duration > 1s ? → keep
- any error? → keep
- normal + fast → keep 1%
Pros: 100% of slow / error traces preserved (huge debugging value)
Cons: needs the whole trace before deciding → buffer + slight latency
Tool: OpenTelemetry Collector's tail_sampling processorReal Backends
| Tool | Type | Strength |
|---|---|---|
| Jaeger | OSS (CNCF) | Self-host, Cassandra/Elasticsearch backend |
| Tempo (Grafana) | OSS | S3/GCS backend — cheap, Grafana integration |
| Zipkin | OSS | Oldest (2012, Twitter), simple |
| Honeycomb | SaaS | High cardinality + BubbleUp (auto anomaly) |
| Lightstep (now ServiceNow) | SaaS | Distributed systems focus, huge trace volume |
| DataDog APM | SaaS | Metrics / logs unified, strong marketing |
| AWS X-Ray | Cloud | Auto-integrated with AWS services |
Common Pitfalls
- Missed context propagation — moving across async tasks / queues without carrying the trace context → broken tree. Wrappers required.
- Span attribute explosion — high-cardinality attributes (user_id etc.) strain backend indexing. Choose deliberately.
- 1% head sampling only — risks dropping the slowest / error traces. Use tail sampling or "errors always sampled".
- SDK overhead — instrumentation itself eats 1-5% CPU. Profile + manual only on hot paths.
- Underestimating trace volume — 1M requests × 100 spans × KB = TB/day. Retention policy required.
Wrap-up
Distributed tracing is fundamentally span trees + context propagation. W3C Trace Context standardizes it for cross-vendor compatibility. OpenTelemetry provides a single instrumentation SDK.
Practical: you can't store every trace — sampling is mandatory. Start with head sampling; consider tail sampling for production debugging value. Always keep errors.