How to safely propagate user IDs via OpenTelemetry Baggage

Attach a sanitized, hashed user token to the OpenTelemetry context at the entry point, explicitly copy that context across every async and thread boundary, and validate the baggage header is present in outbound requests before the span exits your process.

Context and when it matters

User IDs silently disappear across asynchronous execution boundaries or trigger W3C Baggage spec truncation more often than any other baggage metadata failure mode in production. The symptom is always the same: downstream spans arrive in Jaeger or Tempo without a baggage.user.id attribute, breaking multi-tenant query filtering, SLA attribution, and security boundary enforcement.

This scenario matters most when user context must flow from an HTTP gateway through Kafka consumers, gRPC backends, or ThreadPoolExecutor workers — any boundary where context propagation is not automatic. If you only need user ID on the spans within a single service, use span attributes instead (see Baggage vs Span Attributes: When to Use What).


Why user IDs drop: the context detachment model

The diagram below shows the two safe paths (green) and the two common failure paths (red) for carrying a user ID from an HTTP handler into a thread worker and onto an outbound gRPC call.

User ID baggage propagation paths Diagram showing four paths from an HTTP handler: two that correctly propagate user ID baggage (using context.attach and copy_context) and two that silently drop it (run_in_executor without copy, Kafka consumer without explicit attach). HTTP Handler baggage.set_baggage() context.attach(ctx) asyncio.create_task() copy_context().run() run_in_executor() ❌ drops ctx Kafka consumer cb ❌ no ctx attached asyncio Task baggage present ✓ Thread Worker baggage present ✓ Executor Worker baggage {} empty ✗ Kafka Consumer baggage {} empty ✗ Outbound gRPC / HTTP propagate.inject(headers) → baggage: user.id=…

Thread pools, message broker consumer callbacks, and gRPC interceptors do not automatically inherit the parent contextvars.Context. When a worker picks up a task, context propagation is silently severed unless you explicitly copy and re-attach it.

One exception: Python 3.7+ asyncio.create_task() does copy the calling coroutine’s contextvars.Context at creation time, so baggage survives that specific boundary without extra work. concurrent.futures.ThreadPoolExecutor, loop.run_in_executor(), and most message queue consumer callbacks do not — those require contextvars.copy_context().


Minimal working code block

The complete pattern in one file: propagator registration, sanitization, context attachment, and thread-safe dispatch.

import re
import hashlib
import contextvars
from concurrent.futures import ThreadPoolExecutor
from opentelemetry import baggage, context, propagate
from opentelemetry.propagators.composite import CompositePropagator
from opentelemetry.trace.propagation.tracecontext import TraceContextTextMapPropagator
from opentelemetry.baggage.propagation import W3CBaggagePropagator

# ── 1. Propagator registration (order is load-bearing) ────────────────────────
propagate.set_global_textmap(CompositePropagator([
    TraceContextTextMapPropagator(),   # must come first — extracts trace/span IDs
    W3CBaggagePropagator(),            # then baggage key-value pairs
]))

# ── 2. Sanitization: reject bad shapes, hash to prevent PII leakage ───────────
_USER_ID_RE = re.compile(r"^[a-zA-Z0-9_-]{8,64}$")

def safe_user_token(raw_user_id: str) -> str:
    """Returns a fixed-length ASCII token safe for HTTP headers."""
    if not _USER_ID_RE.match(raw_user_id):
        raise ValueError(f"Rejected user ID — failed format check: {raw_user_id!r}")
    digest = hashlib.sha256(raw_user_id.encode()).hexdigest()[:16]
    return f"usr_{digest}"                 # 20-char, no PII, printable ASCII

# ── 3. Attach to the current execution context ────────────────────────────────
def attach_user_id(raw_user_id: str) -> object:
    """Returns the token passed to context.detach() in a finally block."""
    token_value = safe_user_token(raw_user_id)
    ctx = baggage.set_baggage("user.id", token_value)
    return context.attach(ctx)            # all outbound propagators now see user.id

# ── 4. Thread-safe dispatch via copy_context ──────────────────────────────────
def process_request(raw_user_id: str) -> None:
    token = attach_user_id(raw_user_id)
    try:
        snapshot = contextvars.copy_context()   # freeze the current context state
        with ThreadPoolExecutor(max_workers=4) as pool:
            pool.submit(snapshot.run, _worker)  # worker inherits baggage
    finally:
        context.detach(token)                   # always clean up

def _worker() -> None:
    user_id = baggage.get_baggage("user.id")
    assert user_id is not None, "user.id missing — context was not copied correctly"
    # ... business logic ...

Implementation detail: each line mapped to its tracing concept

# TraceContextTextMapPropagator extracts traceparent/tracestate from inbound
# headers and injects them into outbound ones.  It must precede the baggage
# propagator so trace IDs are resolved before baggage key-value pairs are parsed.
propagate.set_global_textmap(CompositePropagator([
    TraceContextTextMapPropagator(),
    W3CBaggagePropagator(),
]))

# baggage.set_baggage() returns a NEW Context object — it does not mutate the
# current one.  You must pass that new context to context.attach() or the value
# will not be visible to propagators running on the same thread.
ctx = baggage.set_baggage("user.id", safe_user_token(raw))
token = context.attach(ctx)   # token is a handle — keep it for detach()

# contextvars.copy_context() snapshots all ContextVar values at the call site,
# including the OTel Context slot.  snapshot.run(fn) executes fn in that
# snapshot, so baggage.get_baggage() inside fn returns the correct value
# even though it runs on a different OS thread.
snapshot = contextvars.copy_context()
pool.submit(snapshot.run, _worker)

# propagate.inject() reads the active Context (or a supplied one) and writes
# the W3C `baggage` header plus `traceparent`/`tracestate` into a carrier dict.
# The HTTP client library picks up the carrier dict and adds the headers.
headers: dict[str, str] = {}
propagate.inject(headers)   # headers now contains "baggage: user.id=usr_a1b2…"

Decision criteria

Use this technique — user ID in baggage metadata — when all three of the following are true:

  • The user ID must be visible to services that do not share a database or session store with the entry point.
  • The downstream service uses it for routing, tenant isolation, or rate-limiting decisions — not just for attaching to its own spans.
  • The value can be pseudonymised (hashed) so it carries no recoverable PII in transit.

Use span attributes (not baggage) when you only need the user ID attached to spans within the current service, or when the value is needed for metrics aggregation rather than cross-boundary routing. The trade-off is explained in depth in Baggage vs Span Attributes: When to Use What.


Common pitfalls

  • Not detaching the context token. context.attach() returns a token that must be passed to context.detach() in a finally block. Leaking tokens corrupts the context stack for subsequent requests handled by the same thread, causing one user’s ID to bleed into another user’s spans — a silent multi-tenant data leak.

  • Propagating the raw user ID without hashing. Raw user IDs in HTTP headers are logged by every proxy, load balancer, and CDN in the path. A SHA-256 truncated to 16 hex characters provides a stable correlation token with zero recoverable PII, and it still fits inside the baggage header’s per-entry 4096-byte limit.

  • Reversing the propagator order. W3CBaggagePropagator before TraceContextTextMapPropagator causes some SDK implementations to parse baggage before the W3C TraceContext span context is established. Downstream spans then lack a parent ID even though the baggage header was injected correctly — a confusing failure that looks like a span lifecycle bug.


Troubleshooting FAQ

Why is baggage.get_baggage("user.id") returning None inside my worker?

The worker is running in a thread that did not inherit the calling thread’s contextvars.Context. Wrap the dispatch with contextvars.copy_context() and call snapshot.run(worker_fn) instead of submitting the function directly to the executor.

Why is the baggage header absent from outbound HTTP requests?

Check that propagate.set_global_textmap() was called before any requests were made. If a framework initialises its HTTP client at import time, the propagator may not yet be registered. Move SDK initialisation to the application entry point, before any client creation.

Why does the baggage header appear but contain the wrong value?

baggage.set_baggage() does not mutate the current context — it returns a new one. If you discard the return value or forget context.attach(), the global propagators inject an empty or stale context. Always capture the return value of both calls.


Validation script for CI

Add this to your test suite to catch propagation regressions before they reach production:

from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export.in_memory_span_exporter import InMemorySpanExporter
from opentelemetry import trace, baggage, context, propagate

def test_user_id_propagation() -> None:
    exporter = InMemorySpanExporter()
    trace.set_tracer_provider(TracerProvider())

    safe_id = safe_user_token("user-abc12345")
    ctx = baggage.set_baggage("user.id", safe_id)
    token = context.attach(ctx)
    try:
        headers: dict[str, str] = {}
        propagate.inject(headers)
        assert "baggage" in headers, "baggage header missing — propagator not registered"
        assert f"user.id={safe_id}" in headers["baggage"], "user.id value mismatch"
        assert len(headers["baggage"].encode()) < 8192, "baggage header exceeds W3C size limit"
    finally:
        context.detach(token)

Edge cases and hardening

  • Third-party proxies stripping large headers. CDNs and legacy API gateways often drop headers exceeding 4 KB. Keep each baggage entry under 4096 bytes and monitor len(headers.get("baggage", "").encode()) in your middleware. Emit a metric when it exceeds 3500 bytes.
  • Inbound baggage as an attack surface. Never blindly propagate inbound baggage from untrusted callers. Run an ingress allowlist filter that drops any key not in your approved set before passing the context to internal services. Unvalidated baggage is a header-injection vector.
  • Multi-tenant key collisions. If multiple teams share a baggage namespace, prefix keys with the tenant or service name (e.g., acme.user.id) to prevent silent overwrites. See Tenant Context Propagation in Multi-Tenant SaaS for a full isolation pattern.
  • Kafka and other brokers. Message broker consumer callbacks have no ambient context. Extract the baggage header from the message envelope, call propagate.extract(carrier), and attach the result before processing. See Propagating Trace Context Through Kafka Consumers for a complete example.

↑ Back to Baggage vs Span Attributes: When to Use What