スキルOfficialmonitoring

🔍otel-genai-instrumentation

プラグイン: honeycomb
ソース: GitHub で見る ↗

説明

OpenTelemetry（オープンソースの監視技術）を使って、GenAIやLLM（大規模言語モデル）アプリケーションをHoneycomb向けに計測・監視するガイドを提供します。プロンプトやレスポンスの内容取得、AIエージェント（自律的に判断・動作するAI）の失敗検出に対応しています。次のような場合に使用: 「GenAIアプリを計測したい」「LLMの呼び出しにトレース機能を追加したい」「AIエージェントをトレースしたい」「OpenAIを計測したい」「Anthropicを計測したい」「GenAI監視」「ツール呼び出しをトレースしたい」「LLMのトークン使用量」「埋め込み（テキストを数値化する処理）を計測したい」「MCPをトレースしたい」「GenAIメトリクス」「LangChainを計測したい」「GenAIスパン（処理の記録単位）を追加したい」「プロンプトを取得したい」「LLMのレスポンスを取得したい」「GenAI向け内容取得を有効化したい」「ストリーミング（連続データ配信）トレース」「ストリーミングレスポンスをトレースしたい」、またはGenAI/LLMアプリケーション計測に関する任意のご質問

原文を表示

Guides instrumentation of GenAI/LLM applications with OpenTelemetry for Honeycomb, including content capture and agent failure detection. Trigger phrases: "instrument my GenAI app", "add tracing to LLM calls", "trace AI agent", "instrument OpenAI", "instrument Anthropic", "GenAI observability", "trace tool calling", "LLM token usage", "instrument embeddings", "trace MCP", "GenAI metrics", "instrument LangChain", "add GenAI spans", "capture prompts", "capture LLM responses", "enable GenAI content capture", "streaming tracing", "trace streaming responses", or any request about instrumenting GenAI/LLM applications.

ユースケース

✓GenAIやLLMアプリを計測・監視したい
✓プロンプトやレスポンスの内容を取得したい
✓AIエージェントの失敗を検出したい
✓LLM呼び出しにトレース機能を追加したい

本文

GenAI Instrumentation for Honeycomb

Instrumenting LLM and agent applications using OTel Semantic Conventions for GenAI (currently v1.40.0, Development status). For conceptual foundations, see the observability-fundamentals skill.

Base OTEL Setup (Required First)

BEFORE implementing GenAI instrumentation, ensure your base OpenTelemetry configuration is complete.

Use the otel-instrumentation skill to configure all standard OTEL environment variables (OTEL_SERVICE_NAME, OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_EXPORTER_OTLP_HEADERS, OTEL_EXPORTER_OTLP_PROTOCOL, signal-specific endpoints, etc.) and verify basic spans are flowing to Honeycomb.

GenAI instrumentation adds GenAI-specific configuration on top of that base setup.

Critical Requirements (Non-Negotiable)

BEFORE implementing any GenAI instrumentation, complete these steps in order:

Step 1: Ask About Content Capture (FIRST!)

Stop and ask the user this question BEFORE writing any code or configuration:

"Do you want to capture the actual prompts and model responses in your traces?

Enabling content capture:

✅ Helps debug tool call failures, planning loops, and agent deadlocks

✅ Lets you see why the model made specific decisions

❌ Captures potentially sensitive content (user prompts, model responses)

❌ May contain PII, proprietary data, or confidential information

Recommended for: debugging/development, non-sensitive data, or if you have filtering

Not recommended for: production with sensitive data, PII/health/financial info"

Record their answer — you'll need it when configuring instrumentation.

Step 2: Enable GenAI Conventions (REQUIRED)

export OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental

Without this, GenAI spans will not be created.

Step 3: Set Required Attributes on EVERY Span (REQUIRED)

gen_ai.operation.name — e.g., chat, execute_tool, invoke_agent
gen_ai.conversation.id — same value for all spans in a conversation

Impact if missing: Spans won't be recognized as GenAI operations and cannot be queried by session.

Step 4: Implement force_flush() (REQUIRED)

GenAI apps often exit early (crash, Ctrl+C, CLI). Force flush after each top-level invocation to prevent silent span loss.

For OTLP configuration, environment variables, and Honeycomb authentication (including the silent-rejection pitfall), see the otel-instrumentation skill.

Prerequisites

This skill assumes your agent application is already sending telemetry to Honeycomb. You should have:

OpenTelemetry SDK installed and initialized
All standard OTEL environment variables configured (see Base OTEL Setup section above)
OTLP exporter configured with your Honeycomb API key
Basic spans flowing to Honeycomb

If you haven't set this up yet, use the otel-instrumentation skill first for:

SDK setup and dependencies
OTEL environment variables (OTEL_SERVICE_NAME, OTEL_EXPORTER_OTLP_*, etc.)
OTLP configuration and Honeycomb authentication
Verification that spans are flowing

Once base telemetry is working, return here to add GenAI-specific instrumentation.

Auto-Instrumentation (Python and Node.js)

Python and Node.js have official OTel auto-instrumentation packages for GenAI providers. Go, Java, etc. require manual instrumentation (section below).

Python

Package	Provider	Min SDK Version
`opentelemetry-instrumentation-openai-v2`	OpenAI	openai >= v1.26.0
`opentelemetry-instrumentation-anthropic`	Anthropic	anthropic >= v0.16.0
`opentelemetry-instrumentation-claude-agent-sdk`	Claude Agent SDK	claude-agent-sdk >= v0.1.14
`opentelemetry-instrumentation-google-genai`	Google GenAI	google-genai >= v1.32.0
`opentelemetry-instrumentation-vertexai`	Vertex AI	google-cloud-aiplatform >= v1.64
`opentelemetry-instrumentation-langchain`	LangChain	langchain >= v0.3.21
`opentelemetry-instrumentation-openai-agents-v2`	OpenAI Agents	openai-agents >= v0.3.3
`opentelemetry-instrumentation-weaviate`	Weaviate	weaviate-client >= v3.0.0, < v5.0.0

Setup: pip install <package> + Instrumentor().instrument() or CLI opentelemetry-instrument.

Node.js

Package	Provider	Min SDK Version
`@opentelemetry/instrumentation-openai`	OpenAI	openai >= 4.19.0
`@opentelemetry/instrumentation-langchain`	LangChain	langchain >= 1.0.0 (not yet published to npm)

Setup: npm install <package> + register via OTel Node SDK.

For per-provider install commands, upstream README links, and supported version details, see ${CLAUDE_PLUGIN_ROOT}/skills/otel-genai-instrumentation/references/auto-instrumentation-setup.md.

Manual Instrumentation

For languages without auto-instrumentation (Go, Java, etc.) or when auto-instrumentation doesn't cover your needs.

Key patterns:

Creating inference spans (chat, text_completion, generate_content)
Creating embedding and retrieval spans
Setting request attributes before the call, response/usage attributes after
Error handling with error.type and span status

For code examples in Python, Node.js, and Go, see ${CLAUDE_PLUGIN_ROOT}/skills/otel-genai-instrumentation/references/manual-instrumentation.md.

Span Flushing for GenAI Apps

Critical for GenAI applications. The BatchSpanProcessor buffers spans (default 5 s schedule delay). GenAI agent runs are long-lived but may exit before the batch flushes — crash, Ctrl+C, short CLI invocations — causing silent span loss.

Rule: force-flush after every top-level agent invocation. Expose the span processor and call forceFlush() without tearing down the SDK, so subsequent invocations continue producing spans.

Why `shutdown()` is wrong here

sdk.shutdown() tears down the entire pipeline — after shutdown, no new spans are recorded. For apps that run multiple agent invocations (polling loops, HTTP servers, CLI batch modes), you need spans to keep flowing. Use forceFlush() instead.

Python

from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor

span_processor = BatchSpanProcessor(exporter)
provider = TracerProvider()
provider.add_span_processor(span_processor)

async def flush_telemetry():
    """Flush pending spans without shutting down."""
    span_processor.force_flush()

Node.js

import { BatchSpanProcessor } from "@opentelemetry/sdk-trace-base";

let spanProcessor: BatchSpanProcessor | null = null;

export function initTelemetry(): void {
  // ... exporter setup ...
  spanProcessor = new BatchSpanProcessor(traceExporter);
  sdk = new NodeSDK({ spanProcessors: [spanProcessor], /* ... */ });
  sdk.start();
}

export async function flushTelemetry(): Promise<void> {
  if (spanProcessor) {
    await spanProcessor.forceFlush();
  }
}

Go

var spanProcessor *sdktrace.BatchSpanProcessor

func InitTelemetry() {
    spanProcessor = sdktrace.NewBatchSpanProcessor(exporter)
    // ... provider setup ...
}

func FlushTelemetry(ctx context.Context) error {
    return spanProcessor.ForceFlush(ctx)
}

Where to call `flushTelemetry()`

After each agent invocation — ensures the full trace (agent + chat + tool spans) is exported before moving to the next task
In polling/server loops — flush after processing each request or ticket
Before process.exit() — as a safety net alongside shutdownTelemetry()
NOT inside the agent loop — flushing per-chat-turn adds latency; flush once at the outer boundary

Example integration:

for (const ticket of tickets) {
  await triageIssue(ticket);   // produces invoke_agent + chat + tool spans
  await flushTelemetry();      // ensure spans are exported before next ticket
}

For complete code examples showing flush integration with tool-calling loops, see ${CLAUDE_PLUGIN_ROOT}/skills/otel-genai-instrumentation/references/manual-instrumentation.md.

GenAI Span Types

Span names MUST follow the pattern "{operation} {identifier}". The gen_ai.operation.name attribute and the span name prefix must match. For example, a span with gen_ai.operation.name = "invoke_agent" must be named "invoke_agent {agent_name}", not "mypackage.DoSomething".

Operation	`gen_ai.operation.name`	SpanKind	Span Name
Chat/completion	`chat`	CLIENT	`chat {model}`
Text completion	`text_completion`	CLIENT	`text_completion {model}`
Content generation	`generate_content`	CLIENT	`generate_content {model}`
Embeddings	`embeddings`	CLIENT	`embeddings {model}`
RAG retrieval	`retrieval`	CLIENT	`retrieval {data_source}`
Tool execution	`execute_tool`	INTERNAL	`execute_tool {tool_name}`
Agent creation	`create_agent`	CLIENT	`create_agent {agent_name}`
Agent invocation	`invoke_agent`	CLIENT/INTERNAL	`invoke_agent {agent_name}`
Workflow step	`invoke_workflow`	INTERNAL	`invoke_workflow {workflow_name}`

Required Attributes on All GenAI Spans

CRITICAL: Every GenAI span MUST include these two attributes. This is non-negotiable.

gen_ai.operation.name — Identifies the operation type (chat, embeddings, execute_tool, invoke_agent, etc.).
- Without this: The span is not recognized as a GenAI operation and will be excluded from GenAI-specific queries and visualizations in Honeycomb
- Set on EVERY span: chat, execute_tool, invoke_agent, embeddings, retrieval, etc.
gen_ai.conversation.id — Ties operations together within a conversation or session.
- Without this: Spans cannot be queried as part of a multi-operation workflow, breaking session-level analysis
- Use the SAME value across all operations in a conversation thread (user request → agent invocation → chat calls → tool executions → responses)
- Generate once at the start of a conversation, propagate to all operations

When to set: When creating the span (in the span attributes), not after.

How to propagate conversation_id:

In-process: Pass as parameter or store in context
HTTP/A2A: Include in request payload or propagate via headers

Impact of missing these attributes:

Missing gen_ai.operation.name → Span not recognized as GenAI operation, excluded from GenAI-specific queries and visualizations
Missing gen_ai.conversation.id → Span excluded from session queries, cannot correlate operations within a conversation, breaks multi-turn analysis

What is a conversation?

A conversation is a customer session or user interaction, NOT a single LLM call. One conversation contains:

Multiple user turns/messages
All LLM calls handling those turns
All tool executions triggered by those LLM calls
All agent invocations within that session

See the OTel GenAI spec for the definition. Key principle: use the same conversation.id when conversation history/context is maintained across operations.

When to use the same conversation_id:

All operations within a single customer session
All turns in a multi-turn interaction
All LLM calls handling those turns
All tool executions and agent invocations within that session
Multiple agents participating in the same session

Example: User starts a support session. Over the next 10 minutes they send 5 messages. The assistant makes 15 LLM calls and executes 8 tools to handle those messages. ALL of these spans share the SAME conversation.id because they're part of one customer session.

Common mistake: Generating a new conversation_id for each LLM call. This breaks session-level analysis. Generate conversation_id ONCE at session start, reuse for all operations until session ends.

For trace structures showing how these spans compose (tool-calling loops, multi-turn conversations, nested agents, workflows), see ${CLAUDE_PLUGIN_ROOT}/skills/otel-genai-instrumentation/references/agent-and-tool-patterns.md.

A2A / HTTP-based agent delegation: When agents communicate over HTTP (A2A protocol, REST delegation), manually propagate both trace context (via headers) AND conversation.id (via payload). Client: propagation.inject() + include conversation.id in request body. Server: propagation.extract() + context.with() + extract conversation.id from payload and pass to all operations. See the "A2A (Agent-to-Agent) HTTP Context Propagation" section in the reference file above.

Generating and Propagating Conversation ID

Generate conversation_id at your application's session boundary:

Chat apps: when user opens new chat/thread
Support systems: when customer starts session
CLI tools: at command invocation
HTTP APIs: when session/conversation is created
Bots: when user starts thread/DM

Pass the SAME conversation_id to all operations within that session — all user turns, all LLM calls handling those turns, all tool executions, all agent invocations.

Propagation methods:

In-process: store in session object, pass as parameter
HTTP/microservices: include in request payload or header (X-Conversation-ID)
Bots: store in state (Redis, DB), retrieve using thread/DM ID

Attribute Completeness

Set all attributes for which you have data available. The OTel GenAI semantic conventions define comprehensive attributes for each operation type — if your application has the data (model name, tokens, tool arguments, etc.), set the corresponding attribute.

Critical principle: Don't selectively omit attributes. Incomplete instrumentation limits your ability to:

Identify which models and agents were involved in a trace
Track token usage and costs across operations
Debug tool call failures (missing arguments/results)
Understand conversation flow (missing messages)
Correlate agent behavior with configuration (missing request parameters)

For the full attribute definitions by operation type, see the upstream semantic conventions:

Model operations (chat, embeddings): https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/
Agent operations (invoke_agent, execute_tool): https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-agent-spans/
Local reference: ${CLAUDE_PLUGIN_ROOT}/skills/otel-genai-instrumentation/references/genai-attributes-catalog.md

What "data available" means:

API response fields → set corresponding response attributes (model, tokens, finish_reasons, response_id)
Request parameters → set request attributes (temperature, max_tokens, top_p, etc.)
Agent metadata → set agent attributes (name, id, description, version)
Tool execution → set tool attributes (name, call_id, arguments, result)
Conversation context → set conversation_id on ALL GenAI spans (required, not optional) — use the same ID across all operations in a conversation thread

The code examples in this skill show core attributes for each operation type. For complete coverage, consult the upstream spec and instrument every attribute your application can populate.

Impact of incomplete instrumentation:

Missing gen_ai.operation.name → span not recognized as GenAI operation, excluded from GenAI queries
Missing gen_ai.conversation.id → span excluded from session queries, cannot correlate operations within a conversation
Missing gen_ai.request.model / gen_ai.response.model → can't identify which model was used
Missing gen_ai.usage.* tokens → can't track costs or identify expensive operations
Missing gen_ai.tool.call.arguments / gen_ai.tool.call.result → can't debug why tools failed or returned unexpected results
Missing gen_ai.input.messages / gen_ai.output.messages → can't see what prompted a response, can't debug planning loops or hallucinations
Missing agent attributes → can't distinguish between agents in multi-agent systems
Missing request parameters → can't correlate behavior with temperature, top_p, etc.

Best practice: Instrument completely from the start. Adding attributes later requires code changes, redeployment, and waiting for new traces to arrive.

Telemetry by Failure Mode

For each failure mode, the listed telemetry enables effective debugging. Items marked [Content Capture] require enabling content capture — ask the user before enabling these.

Tool Call Failures

Span execute_tool: gen_ai.tool.name, gen_ai.tool.call.id, gen_ai.agent.name, gen_ai.conversation.id, error.type, status.code=ERROR, duration, gen_ai.tool.call.arguments, gen_ai.tool.call.result
Metric: gen_ai.client.operation.duration
[Content Capture]: gen_ai.input.messages (tool_call + tool_call_response parts) — shows full context of tool calls (optional, requires user consent)

Network Failures During Retrieval

Span retrieval: gen_ai.data_source.id, server.address, server.port, error.type, status.code=ERROR, duration
Metric: gen_ai.client.operation.duration

Long Time-to-First-Token

Span chat: gen_ai.request.model, gen_ai.usage.input_tokens, server.address, duration
Metrics: gen_ai.client.operation.time_to_first_chunk (hosted APIs) or gen_ai.server.time_to_first_token (self-hosted)
Also: gen_ai.server.time_per_output_token, gen_ai.agent.name

Excessive Planning / Retry Loops

Parent invoke_agent: gen_ai.agent.name, gen_ai.usage.input_tokens, duration
Children execute_tool: gen_ai.tool.name, gen_ai.tool.call.arguments, gen_ai.tool.call.result
Metric: gen_ai.client.token.usage
[Content Capture]: gen_ai.output.messages — model reasoning reveals loop cause (optional but very helpful, requires user consent)

Slow Retrieval

Span retrieval: gen_ai.data_source.id, server.address, server.port, status.code=OK, duration
Metric: gen_ai.client.operation.duration

Agent Deadlocks

Span invoke_agent: gen_ai.agent.name, gen_ai.agent.id, gen_ai.conversation.id, error.type=TimeoutError, span links, duration
Metric: gen_ai.client.operation.duration
[Content Capture]: gen_ai.output.messages (tool_call parts) — reveals circular delegation (optional but very helpful, requires user consent)

Content Capture (Ask User First)

CRITICAL: Do NOT enable content capture without asking the user first.

Step 1: Ask the User

Before providing any configuration, ask this question:

"Do you want to capture the actual prompts and model responses in your traces?

Enabling content capture:

✅ Helps debug tool call failures, planning loops, and agent deadlocks

✅ Lets you see why the model made specific decisions

❌ Captures potentially sensitive content (user prompts, model responses)

❌ May contain PII, proprietary data, or confidential information

Recommended if: debugging/development, non-sensitive data, or you have filtering in place

Not recommended if: production with sensitive data, PII/health/financial info, no filtering"

Step 2: Configure Based on Answer

If user says YES to content capture:

For auto-instrumentation (Python), set the capture mode:

# Recommended for Honeycomb: Capture as span attributes (fully queryable)
export OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=span_only

Why span_only for Honeycomb:

Content stored as span attributes → fully queryable in Honeycomb
Can filter, group, and visualize by message content
Lower overhead than span_and_event

Alternative modes (less common):

# Events only - for high-volume scenarios where you want content in logs but not queryable
export OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=event_only

# Both spans and events - most complete but higher overhead
export OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=span_and_event

# Legacy boolean - deprecated, use span_only instead
export OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true

Mode comparison:

span_only → Content in span attributes (queryable, recommended for Honeycomb)
event_only → Content in events (logging, not queryable)
span_and_event → Both (most complete, 2x overhead)
true → Legacy (maps to old behavior, deprecated)

For manual instrumentation:

Set gen_ai.input.messages on chat spans (before the call)
Set gen_ai.output.messages on chat spans (after the call)

If user says NO to content capture:

Do NOT set OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT (leave unset).

Do NOT include gen_ai.input.messages or gen_ai.output.messages in manual instrumentation.

ALWAYS include regardless of content capture setting:

gen_ai.tool.call.arguments on execute_tool spans
gen_ai.tool.call.result on execute_tool spans

Tool arguments/results are essential for debugging and are typically less sensitive than full conversation content.

What Content Capture Provides

When enabled, gen_ai.input.messages and gen_ai.output.messages show the full conversation — what the user sent, what the model returned, and how tool results were fed back. Without them, you can see that a chat span happened but not why the model made a particular decision.

Example: .env Configuration

If user wants content capture:

# .env

# Base OTEL setup - see otel-instrumentation skill for:
#   OTEL_SERVICE_NAME, OTEL_EXPORTER_OTLP_ENDPOINT,
#   OTEL_EXPORTER_OTLP_HEADERS, OTEL_EXPORTER_OTLP_PROTOCOL, etc.

# GenAI-specific configuration (REQUIRED)
OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental

# Content capture (OPTIONAL - ask user first)
# Recommended for Honeycomb: span attributes (queryable)
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=span_only

# Other content capture options (uncomment one if needed):
# OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=event_only  # Events only, not queryable
# OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=span_and_event  # Both (2x overhead)
# OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true  # Legacy (deprecated)

If user does NOT want content capture:

# .env

# Base OTEL setup - see otel-instrumentation skill for:
#   OTEL_SERVICE_NAME, OTEL_EXPORTER_OTLP_ENDPOINT,
#   OTEL_EXPORTER_OTLP_HEADERS, OTEL_EXPORTER_OTLP_PROTOCOL, etc.

# GenAI-specific configuration (REQUIRED)
OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental
# OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT not set (disabled by default)

What Gets Captured

Content capture enabled (span_only, event_only, or span_and_event):

gen_ai.input.messages — Full prompts sent to model
gen_ai.output.messages — Full model responses
gen_ai.system_instructions — System prompts
gen_ai.tool.definitions — Available tools

Capture mode determines where content is stored:

span_only → Span attributes (queryable in Honeycomb, recommended)
event_only → Event attributes (logging/archival, not queryable in Honeycomb)
span_and_event → Both locations (most complete, double storage/overhead)
true → Legacy mode (deprecated, use span_only)

Content capture disabled (default):

Model name, tokens, finish_reasons, timing — YES (always captured)
Prompt/response content — NO
Tool arguments/results — YES (always recommended)

Message JSON schema: role + parts (text, tool_call, tool_call_response, reasoning); tool_call_response uses response field (not content) for the tool result.

Privacy Controls (If Content Capture Enabled)

If the user enables content capture, recommend these additional safeguards:

Filtering: Capture selectively (e.g., exclude messages with PII)
Truncation: Limit content size (e.g., first 500 chars only)
Hooks: Route to separate access-controlled storage
Access control: Restrict who can query message content in Honeycomb
Environment-based: Full content in dev/test, disabled or filtered in prod

Example filtering pattern (Python):

# Only capture if no PII detected
if not contains_pii(message_content):
    span.set_attribute("gen_ai.input.messages", json.dumps(messages))

Example truncation (any language):

# Limit to first 500 characters
truncated = json.dumps(messages)[:500]
span.set_attribute("gen_ai.input.messages", truncated)

For complete setup including message JSON schemas, per-provider examples, and privacy patterns, see ${CLAUDE_PLUGIN_ROOT}/skills/otel-genai-instrumentation/references/content-capture-setup.md.

Streaming Instrumentation

Streaming (SSE, chunked responses) requires dedicated metrics and span patterns.

Key metrics:

gen_ai.client.operation.time_to_first_chunk — client-observed time until first streamed chunk (includes network latency); use for hosted APIs
gen_ai.server.time_to_first_token — server-side TTFT (queue + prefill); use for self-hosted (vLLM, TGI)
gen_ai.server.time_per_output_token — decode speed after first token
gen_ai.client.operation.time_per_output_chunk — client-observed inter-chunk time

The span covers the full stream lifetime. Set usage attributes after stream completes. Handle mid-stream errors by recording the error and setting span status before closing.

For streaming span lifecycle, code examples, and error handling patterns, see ${CLAUDE_PLUGIN_ROOT}/skills/otel-genai-instrumentation/references/streaming-instrumentation.md.

Evaluation Events

gen_ai.evaluation.result event captures scoring/evaluation of GenAI output.

Attribute	Requirement	Description
`gen_ai.evaluation.name`	Required	Evaluation name (e.g., "relevance", "faithfulness")
`gen_ai.evaluation.score.value`	Recommended	Numeric score
`gen_ai.evaluation.score.label`	Recommended	Categorical label (e.g., "pass", "fail")
`gen_ai.evaluation.explanation`	Recommended	Why this score was given
`gen_ai.response.id`	Recommended	Links evaluation to the inference it scored

Use cases: RAG relevance scoring, hallucination detection, output quality gates.

Metrics

Metric	Type	Unit	Purpose
`gen_ai.client.operation.duration`	Histogram	s	End-to-end latency
`gen_ai.client.token.usage`	Histogram	{token}	Input/output token counts
`gen_ai.client.operation.time_to_first_chunk`	Histogram	s	Streaming TTFC
`gen_ai.client.operation.time_per_output_chunk`	Histogram	s	Streaming inter-chunk
`gen_ai.server.request.duration`	Histogram	s	Server-side latency
`gen_ai.server.time_to_first_token`	Histogram	s	Server TTFT
`gen_ai.server.time_per_output_token`	Histogram	s	Server decode speed
`mcp.client.operation.duration`	Histogram	s	MCP client latency
`mcp.server.operation.duration`	Histogram	s	MCP server latency

For the required x-honeycomb-dataset metrics header, see the otel-instrumentation skill.

MCP Instrumentation

Model Context Protocol instrumentation uses OTel context propagation via params._meta (W3C traceparent/tracestate).

Client spans (CLIENT) for MCP calls, server spans (SERVER) for MCP handlers
Key attributes: mcp.method.name, mcp.session.id, mcp.protocol.version
Metrics: mcp.client.operation.duration, mcp.server.operation.duration

For context propagation details, well-known method names, and code examples, see ${CLAUDE_PLUGIN_ROOT}/skills/otel-genai-instrumentation/references/mcp-instrumentation.md.

Known Gaps & Workarounds

Gap	Workaround
No retry/loop count attribute	Count child spans or diff `tool.call.arguments` across siblings
No inter-agent dependency (in-process)	Span links + `gen_ai.conversation.id`
No inter-agent dependency (HTTP/A2A)	Manual `propagation.inject()` / `extract()` — see agent-and-tool-patterns ref
No retrieval sub-metrics	Custom attributes on retrieval spans
`error.type` is only error signal	Custom attributes for severity/category

Provider-Specific Notes

Anthropic: cache token accounting, gen_ai.provider.name = "anthropic"
OpenAI: system_fingerprint, service tier, gen_ai.provider.name = "openai"
AWS Bedrock: aws.bedrock.guardrail.id, knowledge base attributes
Azure AI: azure.resource_provider.namespace

Additional Resources

Reference Files

${CLAUDE_PLUGIN_ROOT}/skills/otel-genai-instrumentation/references/auto-instrumentation-setup.md — Python + Node.js: per-provider install, upstream README links, supported versions
${CLAUDE_PLUGIN_ROOT}/skills/otel-genai-instrumentation/references/manual-instrumentation.md — Code examples in Python/Node.js/Go for all span types
${CLAUDE_PLUGIN_ROOT}/skills/otel-genai-instrumentation/references/genai-attributes-catalog.md — Upstream semconv links + message JSON schema gotchas
${CLAUDE_PLUGIN_ROOT}/skills/otel-genai-instrumentation/references/agent-and-tool-patterns.md — Trace diagrams: tool-calling loop, multi-turn, nested agents, workflow
${CLAUDE_PLUGIN_ROOT}/skills/otel-genai-instrumentation/references/mcp-instrumentation.md — MCP context propagation, span conventions, method names, metrics
${CLAUDE_PLUGIN_ROOT}/skills/otel-genai-instrumentation/references/streaming-instrumentation.md — Streaming span lifecycle, TTFT/TTFC metrics, mid-stream errors, code examples
${CLAUDE_PLUGIN_ROOT}/skills/otel-genai-instrumentation/references/content-capture-setup.md — Env var + manual setup, message JSON schemas, privacy controls

Cross-References

BEFORE using this skill: Use otel-instrumentation for base SDK setup, all OTEL environment variables (OTEL_SERVICE_NAME, OTEL_EXPORTER_OTLP_*, OTEL_EXPORTER_OTLP_HEADERS, etc.), OTLP config, collector, and sampling
For conceptual foundations of wide events and high cardinality: observability-fundamentals skill
After instrumenting, use the query-patterns skill to verify GenAI data in Honeycomb

原文・著作権は Anthropic および各プラグイン作者に帰属します。日本語訳は Claude API による自動翻訳です。

🔍otel-genai-instrumentation

説明

ユースケース

本文

GenAI Instrumentation for Honeycomb

Base OTEL Setup (Required First)

Critical Requirements (Non-Negotiable)

Step 1: Ask About Content Capture (FIRST!)

Step 2: Enable GenAI Conventions (REQUIRED)

Step 3: Set Required Attributes on EVERY Span (REQUIRED)

Step 4: Implement force_flush() (REQUIRED)

Prerequisites

Auto-Instrumentation (Python and Node.js)

Python

Node.js

Manual Instrumentation

Span Flushing for GenAI Apps

Why shutdown() is wrong here

Python

Node.js

Go

Where to call flushTelemetry()

GenAI Span Types

Required Attributes on All GenAI Spans

Generating and Propagating Conversation ID

Attribute Completeness

Telemetry by Failure Mode

Tool Call Failures

Network Failures During Retrieval

Long Time-to-First-Token

Excessive Planning / Retry Loops

Slow Retrieval

Agent Deadlocks

Content Capture (Ask User First)

Step 1: Ask the User

Step 2: Configure Based on Answer

What Content Capture Provides

Example: .env Configuration

What Gets Captured

Privacy Controls (If Content Capture Enabled)

Streaming Instrumentation

Evaluation Events

Metrics

MCP Instrumentation

Known Gaps & Workarounds

Provider-Specific Notes

Additional Resources

Reference Files

Cross-References

Why `shutdown()` is wrong here

Where to call `flushTelemetry()`