スキルOfficialmonitoring

🍯query-patterns

プラグイン: honeycomb
ソース: GitHub で見る ↗

説明

Honeycombのクエリ（問い合わせ）構築と結果解釈に関する実践的なガイドです。トレースやイベントデータセットを対象に、適切な操作選択（平均値ではなくパーセンタイル値の活用、分布表示にはヒートマップを使用）、関連フィールドのパターン（root.、parent.、any.、none.など）、計算フィールド、クエリの数式演算、結果解釈（P99/P50の比率、ヒートマップの帯、TOTAL/OTHER行、query_result_json経由での生データ取得）をカバーしています。 **次のような場合に使用:** - ユーザーがHoneycomb内でスパン（処理の単位）、トレース（一連の処理）、ログ・イベントデータを問い合わせたい場合 - 「レイテンシ（応答時間）を表示してほしい」「エラー率を知りたい」「遅いリクエストを見つけたい」「異常値を検出したい」「結果を解釈したい」「関連フィールドについて知りたい」「計算フィールドを使いたい」「生データをダウンロードしたい」といったリクエストこのスキルはメトリクスデータセット（dataset_type=metrics）を除くすべてのデータセット型に対応しています。メトリクスデータセットの場合は、metrics-queries スキルをご利用ください。

原文を表示

Opinionated guidance for constructing and interpreting Honeycomb queries on trace and event datasets — operation selection (percentiles not AVG, HEATMAP for distributions), relational field patterns (root., parent., any., none.), calculated fields, query math, and result interpretation (P99/P50 ratios, heatmap bands, TOTAL/OTHER rows, raw JSON via query_result_json). Use this skill when the user wants to query spans, traces, or log/event data in Honeycomb — requests like "show me latency", "error rate", "find slow requests", "find outliers", "interpret results", "relational fields", "calculated fields", or "download raw results". This skill covers all dataset types except metrics datasets (dataset_type=metrics) — for those, use metrics-queries instead.

ユースケース

✓Honeycombでスパン・トレース・ログデータを問い合わせるとき
✓レイテンシや応答時間を表示したいとき
✓エラー率や異常値を検出したいとき
✓遅いリクエストを見つけるとき
✓クエリ結果を解釈したいとき

本文（日本語訳）

Honeycomb クエリパターン

効果的な Honeycomb クエリを書くための実践的なガイドです。MCP ツールはすでにパラメータとスキーマを記載していますが、このスキルは各パターンの「使い方」ではなく「いつ・なぜ」使うのかに焦点を当てています。

主要な原則

レイテンシに AVG を使わない — 平均値は遅いユーザーの実態を隠します。P99（またはP95/P90）を使って、実際に遅いユーザーが経験する速度を見てください。平均値はペイロードサイズなど非レイテンシメトリクスに限定します。
分布を見るには HEATMAP を使う — 単一の数値集計は二峰分布（複数のピークを持つ分布）を隠します。HEATMAP は、ユーザー層が1つなのか2つなのかを明らかにします。
計算を1つのクエリにまとめる — COUNT, P99(duration_ms), HEATMAP(duration_ms) を1つのクエリで実行すれば、API呼び出しが減り、完全な情報が得られます。
まず広く、次に WHERE で絞る — COUNT/GROUP BY で全体像を把握してから、フィルタを追加して焦点を絞ります。
過去のクエリを確認する — 新しいクエリを書く前に find_queries を実行してください。誰かがすでに同じ質問に答えているかもしれません。

適切な操作の選択

質問	使用する操作
トラフィック量は？	`COUNT`（ルートまたはサービスでグループ化）
ユニークユーザー/IP数は？	`COUNT_DISTINCT(field)`
ほとんどのユーザーにとって速度は？	`P50(duration_ms)`
最も遅いユーザーにとって速度は？	`P99(duration_ms)`
二峰分布のパターンはあるか？	`HEATMAP(duration_ms)`
最悪のケースは？	`MAX(duration_ms)`
同時実行の数は？	`CONCURRENCY`
時間とともに悪くなっているか？	`RATE_AVG(duration_ms)`

計算フィールド

計算フィールドはクエリ実行時に評価される行単位の式で、既存フィールドを変換・分類・結合し、コードの再計測不要で処理します。

3つのスコープ — 用途に合った最小範囲を選びます:

クエリスコープ（保存なし）: 試験的で一回限りの分析
データセットレベル（保存）: 1つのサービスのデータセット内で再利用可能
環境レベル（保存）: すべてのデータセット横断で再利用可能（例: error_pct）

一般的なパターン:

エラー率: MUL(IF($error, 1, 0), 100) → AVG(error_pct) で割合を取得
ステータス分類: IF(GTE($http.status_code, 500), "5xx", GTE($http.status_code, 400), "4xx", "ok")
レイテンシの区間分け: BUCKET($duration_ms, 500, 0, 3000)
パスプレフィックスによるルーティング: IF(STARTS_WITH($url, "/admin"), "admin", STARTS_WITH($url, "/api"), "api", "other")
完全一致分類: IF(EQUALS(...)) チェーンではなく SWITCH を使用 — 同じ結果、より効率的

重要なガードレール:

見た目だけの（エイリアス専用の）フィールドを作らない — 単に別フィールドの名前を変えるだけのフィールドは分析価値を持たず、スキーマを散らかします。実際の計算（分類・抽出・演算）をするときだけ計算フィールドを保存してください。
大規模・複雑なフィールドに対して正規表現を避ける — exception.stacktrace、db.statement、ログ全体で REG_MATCH、REG_VALUE、REG_COUNT を実行すると非常に遅くなります。最初に、より的を絞った OpenTelemetry（標準計測仕様）フィールド（exception.type、exception.message、db.operation）が存在するかを確認してください。長いフィールドで正規表現を使う場合は、最初に CONTAINS チェックで守ってください。
EQUALS は厳格な型マッチング — EQUALS($http.status_code, 200) は、フィールドが文字列として保存されている場合、静かに false を返します。比較前に find_columns でフィールド型を確認してください。
FORMAT_TIME はコスト高 — 高容量クエリでは避けてください。
一回限りの作業はデータセットレベルではなくクエリスコープで保存 — 保存フィールドはすべてのユーザーのスキーマに表示されます。

完全な構文、演算子リファレンス、反パターン例については、${CLAUDE_PLUGIN_ROOT}/skills/query-patterns/references/calculated-fields.md を参照してください。

すべてのクエリの前に

ユーザー向けレイテンシを測定する場合は is_root でフィルタ — これがないと内部スパンが数値を膨らませます。
人間が読める時間範囲を使用（"24h"、"-6h"）— エポックタイムスタンプはエラーしやすく、レビューが困難です。
クエリ前に find_columns で列を検証 — フィールド名の存在を確認し、空の結果を防ぎます。

結果の解釈

クエリ実行後、MCP ツールはフォーマット済みマークダウンとメタデータを返します。最も重要なメタデータフィールドは query_result_json — 生の JSON 結果へのサイン済み URL です。正確な分析のため、ASCII レンダリングだけに頼らず、jq や Python で解析してダウンロードしてください。

重要な解釈ルール:

P99/P50 > 10倍 — 二峰分布の可能性が高い。HEATMAP で確認します。
結果の TOTAL 行 = すべてのグループ横断の集計
OTHER 行 = クエリ制限を超えたグループ（OTHER が大きい場合は制限を増やしてください）
ASCII ヒートマップ ▁▂▃▄▅▆▇█ = 低～高の密度。2つのバンド = 2つの母集団
メタデータの query_run_pk — これを run_bubbleup に直接渡して外れ値分析を実行

その他のリソース

リファレンスファイル

${CLAUDE_PLUGIN_ROOT}/skills/query-patterns/references/visualize-operations.md — VISUALIZE 操作の完全リファレンスと例
${CLAUDE_PLUGIN_ROOT}/skills/query-patterns/references/relational-fields.md — 関連フィールドの詳細ガイドとクロスサービスパターン
${CLAUDE_PLUGIN_ROOT}/skills/query-patterns/references/query-examples.md — ユースケース別に整理された充実したクエリ集
${CLAUDE_PLUGIN_ROOT}/skills/query-patterns/references/result-interpretation.md — クエリ結果の解釈、生 JSON アクセス、統計ヒューリスティックス（判断法則）のガイド
${CLAUDE_PLUGIN_ROOT}/skills/query-patterns/references/calculated-fields.md — 計算フィールド構文、完全な演算子リファレンス、一般的なパターンと反パターン（見た目だけフィールド、コスト高い文字列操作、型不一致）

Honeycomb Query Patterns

Opinionated guidance for writing effective Honeycomb queries. The MCP tools already document their parameters and schemas — this skill focuses on when and why to use each pattern, not how to call the tools.

Key Principles

Never use AVG for latency — AVG hides tail latency. Use P99 (or P95/P90) to see what slow users experience. Reserve AVG for non-latency metrics like payload size.
Use HEATMAP for distributions — Single-number aggregates hide bimodal patterns. HEATMAP reveals whether you have one population or two.
Combine calculations in one query — COUNT, P99(duration_ms), HEATMAP(duration_ms) in a single query reduces API calls and gives a complete picture.
Start broad, narrow with WHERE — Begin with a COUNT/GROUP BY to understand shape, then add filters to focus.
Check for prior work — Call find_queries before writing new queries. Someone may have already answered the question.

Choosing the Right Operation

Question	Use
How much traffic?	`COUNT` grouped by route or service
How many unique users/IPs?	`COUNT_DISTINCT(field)`
How fast for most users?	`P50(duration_ms)`
How fast for the worst-off users?	`P99(duration_ms)`
Is there a bimodal pattern?	`HEATMAP(duration_ms)`
What's the worst case?	`MAX(duration_ms)`
How many concurrent operations?	`CONCURRENCY`
Is it getting worse over time?	`RATE_AVG(duration_ms)`

Relational Field Strategy

Use relational prefixes to ask cross-span questions within a trace:

"Show me slow endpoints caused by a specific downstream": Filter with any.service.name to find traces where that service participates, group by root.http.route to see which user-facing endpoints are affected.
"What's different about errored traces?": Filter with any.error = true, group by root.name to see which entry points have errors somewhere in their trace tree.
Exclude noise: none.service.name = "health-check" removes traces containing health checks.

Calculated Fields

Calculated fields are per-event expressions evaluated at query time. They transform, classify, and combine existing fields without re-instrumenting code.

Three scopes — choose the narrowest that fits the need:

Query-scoped (not saved): exploratory, one-off analysis
Dataset-level (saved): reusable within one service's dataset
Environment-level (saved): reusable across all datasets (e.g., error_pct)

Common patterns:

Error rate: MUL(IF($error, 1, 0), 100) → use AVG(error_pct) to get percentage
Status classification: IF(GTE($http.status_code, 500), "5xx", GTE($http.status_code, 400), "4xx", "ok")
Latency bucketing: BUCKET($duration_ms, 500, 0, 3000)
Prefix routing: IF(STARTS_WITH($url, "/admin"), "admin", STARTS_WITH($url, "/api"), "api", "other")
Exact-match classification: use SWITCH instead of IF(EQUALS(...)) chains — same expression, more efficient

Key guardrails:

Don't create presentational (alias-only) fields — a field that just renames another field adds no analytical value and clutters the schema. Only save a calculated field when it does real computation (classification, extraction, math).
Avoid regex on large/complex fields — running REG_MATCH, REG_VALUE, or REG_COUNT on exception.stacktrace, db.statement, or full log lines can be very slow. Check whether a more targeted OTel field exists first (exception.type, exception.message, db.operation). If you must regex a long field, guard it with a CONTAINS check first.
EQUALS has strict type matching — EQUALS($http.status_code, 200) silently returns false if the field is stored as a string. Use find_columns to verify the field type before comparing.
FORMAT_TIME is expensive — avoid in high-volume queries.
Save query-scoped, not dataset-level, for one-off work — saved fields show up in everyone's schema.

For full syntax, operator reference, and extended anti-pattern examples, consult ${CLAUDE_PLUGIN_ROOT}/skills/query-patterns/references/calculated-fields.md.

Before Every Query

Filter on is_root when measuring user-facing latency — without it, internal spans inflate the numbers
Use human-readable time ranges ("24h", "-6h") — epoch timestamps are error-prone and hard to review
Validate columns with find_columns before querying — confirms field names exist and prevents empty results

Interpreting Results

After running a query, the MCP tool returns formatted markdown plus metadata. The most important metadata field is query_result_json — a signed URL to the raw JSON result. For precise analysis, download it and parse with jq or python rather than relying solely on the ASCII rendering.

Key interpretation rules:

P99/P50 > 10x — bimodal distribution likely; run HEATMAP to confirm
TOTAL row in breakdown results = aggregate across all groups
OTHER row = groups beyond the query limit (increase limit if OTHER is large)
ASCII heatmap ▁▂▃▄▅▆▇█ = density from low to high; two bands = two populations
query_run_pk in metadata — feed directly to run_bubbleup for outlier analysis

Additional Resources

Reference Files

${CLAUDE_PLUGIN_ROOT}/skills/query-patterns/references/visualize-operations.md — Complete VISUALIZE operation reference with examples
${CLAUDE_PLUGIN_ROOT}/skills/query-patterns/references/relational-fields.md — Detailed relational field guide with cross-service patterns
${CLAUDE_PLUGIN_ROOT}/skills/query-patterns/references/query-examples.md — Extensive query cookbook organized by use case
${CLAUDE_PLUGIN_ROOT}/skills/query-patterns/references/result-interpretation.md — Guide to interpreting query results, raw JSON access, and statistical heuristics
${CLAUDE_PLUGIN_ROOT}/skills/query-patterns/references/calculated-fields.md — Calculated field syntax, full operator reference, common patterns, and anti-patterns (presentational fields, expensive string ops, type mismatches)

Cross-References

For the structured investigation workflow that uses these query patterns: production-investigation skill
For SLO interpretation and burn alert design: slos-and-triggers skill

原文・著作権は Anthropic および各プラグイン作者に帰属します。日本語訳は Claude API による自動翻訳です。