background-shape
Opentelemetry article cover illustration on a gradient background
March 6, 2026 · 7 min read · by Muhammad Amal programming
Advertisement

TL;DR — Use the gen_ai.* attribute namespace so dashboards built by one team work for everyone / capture prompts and completions as span events, not attributes / migrate ad-hoc names with an attribute-rename processor in the Collector.

In my last post on OpenTelemetry LLM tracing I quietly used attribute names like gen_ai.request.model and gen_ai.usage.input_tokens without explaining where they came from. They aren’t mine. They’re part of the OpenTelemetry GenAI semantic conventions, and adopting them is the difference between telemetry that composes and telemetry that’s a per-team snowflake.

I learned this the expensive way. Three teams in one org each instrumented their LLM calls independently. One used model, another llm.model.name, a third ai.model. Token counts were tokens_in / prompt_tokens / input_token_count. Every shared dashboard needed three OR clauses, every alert had three variants, and onboarding a new service meant reverse-engineering whatever the author felt like that day. A naming standard would have cost an afternoon up front.

Advertisement

Semantic conventions are that standard. They’re a published, versioned agreement on what to call things so a span emitted by a Python service and one emitted by a Go service describe the same concept with the same key. This post covers what the GenAI conventions specify, how to emit conformant telemetry, and how to migrate an existing service without a flag-day rewrite.

What the GenAI Conventions Actually Specify

The conventions are organized into a few groups. The full, authoritative list lives in the OpenTelemetry GenAI semantic conventions — treat what follows as an orientation, not a replacement for the spec.

Span attributes describe the operation. The core ones:

  • gen_ai.operation.namechat, text_completion, embeddings, execute_tool.
  • gen_ai.system — the provider: openai, anthropic, aws.bedrock, vertex_ai.
  • gen_ai.request.model — the model you asked for.
  • gen_ai.response.model — the model that actually answered (often more specific).
  • gen_ai.request.temperature, gen_ai.request.max_tokens, gen_ai.request.top_p — sampling parameters.
  • gen_ai.usage.input_tokens, gen_ai.usage.output_tokens — token accounting.
  • gen_ai.response.finish_reasons — an array of stop reasons.

Span naming. The convention is {gen_ai.operation.name} {gen_ai.request.model} — for example chat gpt-4o. That’s low-cardinality and self-describing.

Events carry the message content. This is the part teams most often get wrong. Prompts and completions are not span attributes. They’re structured events on the span, which keeps large payloads out of the attribute set and lets you drop or sample them independently.

Emitting Conformant Spans

Here’s a wrapper that follows the conventions. The shape of the code matters less than the keys — those are the contract.

# genai_span.py
from contextlib import contextmanager
from opentelemetry import trace
from opentelemetry.trace import SpanKind, Status, StatusCode

tracer = trace.get_tracer("genai.client", "1.0.0")


@contextmanager
def genai_span(operation: str, system: str, request_model: str, **params):
    """Open a span that conforms to the GenAI semantic conventions."""
    span_name = f"{operation} {request_model}"
    with tracer.start_as_current_span(span_name, kind=SpanKind.CLIENT) as span:
        span.set_attribute("gen_ai.operation.name", operation)
        span.set_attribute("gen_ai.system", system)
        span.set_attribute("gen_ai.request.model", request_model)
        # Optional sampling parameters — set only if present.
        for key in ("temperature", "max_tokens", "top_p"):
            if key in params and params[key] is not None:
                span.set_attribute(f"gen_ai.request.{key}", params[key])
        try:
            yield span
        except Exception as exc:
            span.set_status(Status(StatusCode.ERROR, type(exc).__name__))
            span.record_exception(exc)
            raise

And the call site, recording response attributes and finish reasons as an array:

# call_site.py
from openai import OpenAI
from genai_span import genai_span

client = OpenAI()


def conformant_chat(messages: list[dict]) -> str:
    with genai_span(
        operation="chat",
        system="openai",
        request_model="gpt-4o-2024-11-20",
        temperature=0.2,
        max_tokens=512,
    ) as span:
        resp = client.chat.completions.create(
            model="gpt-4o-2024-11-20",
            messages=messages,
            temperature=0.2,
            max_tokens=512,
        )
        span.set_attribute("gen_ai.response.model", resp.model)
        span.set_attribute("gen_ai.response.id", resp.id)
        # finish_reasons is an ARRAY per the spec, even for a single choice.
        span.set_attribute(
            "gen_ai.response.finish_reasons",
            [c.finish_reason or "unknown" for c in resp.choices],
        )
        if resp.usage:
            span.set_attribute(
                "gen_ai.usage.input_tokens", resp.usage.prompt_tokens
            )
            span.set_attribute(
                "gen_ai.usage.output_tokens", resp.usage.completion_tokens
            )
        return resp.choices[0].message.content or ""

Note gen_ai.response.finish_reasons is plural and an array. People reflexively write a scalar finish_reason because that’s what the SDK gives them per choice. The convention models the whole response, which can have multiple choices, so it’s an array. Match the spec even when it feels like overkill for the single-choice case — that’s what makes the data portable.

Capturing Prompts and Completions as Events

Message content goes in events. The convention defines event names per role. Here’s a helper that records them and respects a capture toggle, because in many environments you must not persist prompt content.

# genai_events.py
import os
import json
from opentelemetry.trace import Span

# Content capture is opt-in. Default off so production doesn't leak PII.
CAPTURE_CONTENT = os.getenv("OTEL_GENAI_CAPTURE_CONTENT", "false") == "true"


def record_messages(span: Span, messages: list[dict]) -> None:
    """Emit one event per input message, per the GenAI conventions."""
    for msg in messages:
        role = msg.get("role", "user")
        body: dict = {"role": role}
        if CAPTURE_CONTENT:
            body["content"] = msg.get("content", "")
        span.add_event(
            f"gen_ai.{role}.message",
            attributes={"gen_ai.system": "openai",
                        "body": json.dumps(body)},
        )


def record_completion(span: Span, content: str, finish_reason: str) -> None:
    """Emit the model's response as a choice event."""
    body: dict = {"finish_reason": finish_reason, "index": 0}
    if CAPTURE_CONTENT:
        body["message"] = {"role": "assistant", "content": content}
    span.add_event(
        "gen_ai.choice",
        attributes={"gen_ai.system": "openai", "body": json.dumps(body)},
    )

The OTEL_GENAI_CAPTURE_CONTENT switch matters. The conventions explicitly treat content capture as opt-in because prompts routinely contain user data subject to retention rules. Keep it off in production by default; flip it on in a staging environment or a sampled eval pipeline where you’ve thought through retention.

Migrating an Existing Service

You can’t rename attributes across a fleet in one commit. The Collector lets you do it centrally without touching service code. Use the transform processor to rename legacy keys on ingest.

# collector-config.yaml
processors:
  transform/genai_migration:
    trace_statements:
      - context: span
        statements:
          # Old key -> conventional key. Copy then delete.
          - set(attributes["gen_ai.request.model"], attributes["model"])
            where attributes["model"] != nil
          - delete_key(attributes, "model")
          - set(attributes["gen_ai.usage.input_tokens"],
                attributes["prompt_tokens"])
            where attributes["prompt_tokens"] != nil
          - delete_key(attributes, "prompt_tokens")
          - set(attributes["gen_ai.usage.output_tokens"],
                attributes["completion_tokens"])
            where attributes["completion_tokens"] != nil
          - delete_key(attributes, "completion_tokens")

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [transform/genai_migration, batch]
      exporters: [otlphttp]

This buys you a migration window: dashboards switch to gen_ai.* immediately, services migrate at their own pace, and the Collector normalizes the lag. Once every service emits conventional keys natively, delete the processor.

Common Pitfalls

  • Putting message content in attributes. Large payloads bloat span size and many backends truncate attributes silently. Content goes in events.
  • Scalar finish_reason. The convention is gen_ai.response.finish_reasons, plural, an array. A scalar breaks queries written against the spec.
  • Inventing a gen_ai.cost attribute. Cost is derived (tokens times price) and price changes over time. The conventions don’t define a cost attribute on purpose — compute it downstream.
  • Capturing content by default. That’s a compliance incident waiting to happen. Make it opt-in.
  • Pinning to a stale convention version. The GenAI conventions are still evolving. Note which version you target and revisit on upgrades.
  • Mixing gen_ai.system values. Pick the canonical string (openai, not OpenAI or oai). Casing and aliases fragment your data just like custom keys do.

Troubleshooting

Symptom: dashboards show data for some services and not others. Cause: services emit different attribute keys for the same concept. Fix: add the transform processor to normalize legacy keys, then migrate services to conventional names.

Symptom: span content is missing in production but present in staging. Cause: OTEL_GENAI_CAPTURE_CONTENT is off in production — which is the intended default. Fix: this is correct behavior; if you need content, capture it in a sampled, access-controlled pipeline rather than flipping the flag fleet-wide.

Symptom: finish_reasons query returns nothing. Cause: the service emits a scalar finish_reason. Fix: change the service to emit the array-valued gen_ai.response.finish_reasons, or map it in the Collector.

Symptom: span attributes are truncated in the backend. Cause: message content was placed in an attribute and exceeded the backend’s attribute size limit. Fix: move content to gen_ai.*.message events.

Symptom: the transform processor fails to start. Cause: a typo in an OTTL statement — the Collector validates OTTL at startup. Fix: read the startup error, which names the offending statement, and check for missing where guards on nil values.

What’s Next

With a shared attribute vocabulary in place, your telemetry composes — a dashboard built once works for every service that follows the conventions. The natural follow-on is turning gen_ai.usage.* into money: per-request cost telemetry built on these exact attribute names, which is the next post in this series.

Advertisement