📊metrics-queries
- プラグイン
- honeycomb
- ソース
- GitHub で見る ↗
説明
OpenTelemetry メトリクス(監視用の数値データ)データセットを Honeycomb で正しく問い合わせる方法について説明します。メトリクスデータセットはトレース/イベントデータセットとは異なるルールに従います。多くの操作(単純な COUNT、RATE_SUM、RATE_AVG、RATE_MAX、CONCURRENCY)は使用できず、時間軸での集約が自動的に行われ、各メトリクスは独自の属性を持ちます。 **次のような場合に使用:** - メトリクスデータセット(ゲージ、カウンター、ヒストグラム、合計値)を問い合わせる - 時間軸での集約方法(RATE、INCREASE、SUMMARIZE、LAST)について質問する - メトリクスデータセットを探したり、メトリクス名や属性を見つけたりする - メトリクス問い合わせの予期しない結果をデバッグする - CPU、メモリ、ディスク I/O、ネットワーク統計などのインフラストラクチャメトリクスを問い合わせる **使用しないでください:** - メトリクスの計測設定(otel-instrumentation を使用してください) - 名前に「メトリクス」を含むイベントデータセットの問い合わせ - 概念的な質問(observability-fundamentals を使用してください)
原文を表示
How to query OpenTelemetry metrics datasets in Honeycomb correctly. Metrics datasets follow different rules from trace/event datasets — many operations (bare COUNT, RATE_SUM, RATE_AVG, RATE_MAX, CONCURRENCY) are forbidden, temporal aggregation is automatic, and each metric has its own attributes. Use this skill when querying a metrics dataset (gauges, counters, histograms, sums), asking about temporal aggregation (RATE, INCREASE, SUMMARIZE, LAST), finding the metrics dataset or discovering metric names and attributes, debugging unexpected metrics query results, or querying infrastructure metrics like CPU, memory, disk I/O, or network stats. Do NOT use for instrumenting metrics (use otel-instrumentation), querying event datasets with "metrics" in their name, or conceptual questions (use observability-fundamentals).
ユースケース
- ✓メトリクスデータセットを問い合わせる
- ✓時間軸での集約方法について質問する
- ✓メトリクス名や属性を探す
- ✓メトリクス問い合わせの結果をデバッグする
- ✓インフラストラクチャメトリクスを問い合わせる
本文
Querying Metrics in Honeycomb
Metrics datasets in Honeycomb behave differently from tracing/event datasets. Operations that work on traces may fail or produce misleading results on metrics. This skill covers those differences so you construct correct, useful metrics queries.
Finding the Metrics Dataset
Metrics datasets are not identified by having "metrics" in their name. Many event
datasets contain "metrics" in their slug (e.g., kafka-metrics, refinery-metrics,
kubernetes-node-metrics). These are ordinary event datasets, not metrics datasets.
How to identify the real metrics dataset:
- Call
get_environmentand look for rows wheredataset_type=metrics. The slug is typicallymetricsbut may differ per environment. - Alternatively, call
get_dataset_columnson a candidate dataset — metrics datasets return aMetricInfocolumn showing type metadata likegauge,sum(cumulative,monotonic), orhistogram(delta). Event datasets do not have this.
Do not guess the dataset. Always verify via get_environment or get_dataset_columns
before constructing a metrics query. If the user says "metrics" but means an event dataset
with metrics-like fields (e.g., telegraf, system_stats), the query rules below do not apply —
those are event datasets and follow normal query patterns from the query-patterns skill.
Discovering Metrics and Their Attributes
Metrics datasets have a fundamentally different schema from event datasets. Each metric has its own set of resource and data point attributes. Two metrics in the same dataset may have completely different attributes available for filtering and grouping.
Workflow for discovering what to query:
-
Find metric names: Call
get_dataset_columnson the metrics dataset (withoutmetric_name). This returns metric names with their types inMetricInfo. Usefind_columnswith keywords to search for specific metrics (e.g., "cpu", "memory", "http request duration"). -
Find attributes for a specific metric: Call
get_dataset_columnswith themetric_nameparameter set to the metric you want to query (e.g.,metric_name: "k8s.pod.memory.usage"). This returns the resource attributes and data point attributes that co-occur with that metric, along with sample values. These are what you can use in WHERE and GROUP BY clauses. -
Validate before querying: Not all attributes exist on all metrics. Always use step 2 to confirm available attributes before adding them to filters or breakdowns.
Allowed vs. Forbidden Operations on Metrics Datasets
The following operations are NOT allowed on metrics datasets:
| Forbidden Operation | Why |
|---|---|
COUNT (without column) |
Counts metric events, not metric values — meaningless for metrics |
RATE_SUM |
Not supported on metrics datasets |
RATE_AVG |
Not supported on metrics datasets |
RATE_MAX |
Not supported on metrics datasets |
CONCURRENCY |
Requires span duration; metrics have no duration |
Use these instead:
| Goal | Use on Metrics |
|---|---|
| Visualize a gauge value | AVG(metric), MAX(metric), HEATMAP(metric) |
| Visualize a counter/sum | SUM(metric), AVG(metric), MAX(metric) |
| See distribution of values | HEATMAP(metric), P50(metric), P99(metric) |
| Track per-second rate of change | Override temporal aggregation with a calculated field (see below) |
| Percentile analysis | P50(metric), P90(metric), P99(metric) |
| Count of non-null values | COUNT(metric) (with a column specified) |
Metric Types and Temporal Aggregation
Honeycomb automatically applies temporal aggregation to align raw metric values into
query time steps. The function it applies depends on the metric type, visible in the
MetricInfo column from get_dataset_columns.
Default Temporal Aggregation by Metric Type
| MetricInfo | Type | Default Function | What It Does |
|---|---|---|---|
gauge |
Gauge | LAST() |
Returns most recent value per time step |
sum(cumulative,monotonic) |
Monotonic cumulative sum | INCREASE() |
Change between steps, handles counter resets |
sum(cumulative) |
Non-monotonic cumulative sum | LAST() |
Most recent value (can go up or down) |
sum(delta) or sum(delta,monotonic) |
Delta sum | SUMMARIZE() |
Sums all values in each step |
histogram(cumulative) |
Cumulative histogram | INCREASE() |
Change per bucket between steps |
histogram(delta) |
Delta histogram | SUMMARIZE() |
Sums bucket values in each step |
These defaults are applied automatically — you do not need to configure them.
The results you see from AVG, MAX, P99, etc. on a metrics dataset already
reflect temporal aggregation having been applied first.
Overriding Temporal Aggregation
To override the default (e.g., to see RATE instead of INCREASE for a cumulative counter),
use a query-scoped calculated field wrapping the metric name in a temporal aggregation
function, then apply a spatial aggregation to that field in calculations.
{
"calculated_fields": [
{ "name": "req_rate", "expression": "RATE($http.server.requests, 300)" }
],
"calculations": [
{ "op": "AVG", "column": "req_rate" }
]
}
Supported temporal aggregation functions for calculated fields:
LAST($metric)— most recent data point per step (gauges, non-monotonic sums)SUMMARIZE($metric)— sum all values per step with interpolation (delta metrics)INCREASE($metric[, range_interval_seconds])— change in value across range, handles counter resetsRATE($metric[, range_interval_seconds])— per-second rate of change (INCREASE / time)
The optional range_interval_seconds parameter (integer, in seconds) controls the lookback
window for calculating changes. Use it to smooth results or compensate for sparse data.
When omitted, the query's granularity is used as the range interval.
Important: You must still apply a spatial aggregation (AVG, SUM, P99, HEATMAP, etc.)
to the calculated field in calculations. The temporal aggregation function alone does not
produce a visualization — it transforms the raw metric values, then the spatial aggregation
summarizes across timeseries.
For detailed reference on temporal aggregation functions, counter reset handling, and
range_interval_seconds, see:
${CLAUDE_PLUGIN_ROOT}/skills/metrics-queries/references/temporal-aggregation.md
Querying Histogram Metrics
OpenTelemetry histograms are stored as a collection of sub-fields. For a histogram
named http.server.duration, Honeycomb creates:
| Field | Meaning |
|---|---|
http.server.duration.count |
Total number of data points |
http.server.duration.sum |
Sum of all values |
http.server.duration.avg |
Mean value (sum/count) |
http.server.duration.p50 |
Median (50th percentile) |
http.server.duration.p99 |
99th percentile |
http.server.duration.p001 through .p999 |
Full range of percentiles |
Two ways to query histograms:
-
Use the parent column name directly with percentile or distribution operations. This is the recommended approach:
{ "op": "P99", "column": "http.server.duration" }{ "op": "HEATMAP", "column": "http.server.duration" } -
Use sub-fields with MAX when you want the worst-case pre-computed percentile across all timeseries in a step:
{ "op": "MAX", "column": "http.server.duration.p99" }This returns the highest p99 value reported by any single timeseries in the time step, which differs from
P99(http.server.duration)which computes the 99th percentile across all data.
When to use which:
- For most analysis: use
P99(parent_column)orHEATMAP(parent_column) - For worst-case bounds across hosts/pods: use
MAX(parent_column.p99) - For throughput from histograms: use
SUM(parent_column.count)orAVG(parent_column.count)
Query Math with Metrics
Query math (compound queries with named calculations and formulas) works on metrics datasets the same way it works on event datasets. Name your calculations, add per-calculation filters if needed, and define formulas to combine them.
Common metrics formula patterns:
Utilization percentage
{
"calculations": [
{ "op": "AVG", "column": "k8s.pod.memory.usage", "name": "used" },
{ "op": "AVG", "column": "k8s.pod.memory.available", "name": "available" }
],
"formulas": [
{ "name": "utilization_pct", "expression": "$used / ($used + $available) * 100" }
],
"breakdowns": ["k8s.pod.name"],
"orders": [{ "column": "utilization_pct", "order": "descending" }],
"limit": 20
}
Histogram tail ratio
{
"calculations": [
{ "op": "P50", "column": "http.server.duration", "name": "median" },
{ "op": "P99", "column": "http.server.duration", "name": "tail" }
],
"formulas": [
{ "name": "tail_ratio", "expression": "$tail / $median" }
],
"breakdowns": ["service.name"]
}
Error rate from counters (with temporal aggregation override)
{
"calculated_fields": [
{ "name": "error_rate", "expression": "RATE($http.server.errors)" },
{ "name": "request_rate", "expression": "RATE($http.server.requests)" }
],
"calculations": [
{ "op": "SUM", "column": "error_rate", "name": "errors_per_sec" },
{ "op": "SUM", "column": "request_rate", "name": "requests_per_sec" }
],
"formulas": [
{ "name": "error_pct", "expression": "$errors_per_sec / $requests_per_sec * 100" }
]
}
For more query examples, see:
${CLAUDE_PLUGIN_ROOT}/skills/metrics-queries/references/metrics-query-examples.md
Granularity for Metrics
Metrics arrive at known, regular intervals (e.g., every 10s, 30s, or 60s). Granularity matters more for metrics than for traces:
- Align granularity with the reporting interval. If metrics report every 60 seconds, use a granularity that divides evenly into 60 (e.g., 60, 120, 300). Misaligned granularity causes uneven bucket sizes that produce noisy results.
- Spiky-looking graphs usually mean the granularity is finer than the reporting interval. Increase granularity or, in the UI, enable "Omit Missing Values" to produce continuous lines.
- RATE operations and granularity:
RATE_SUM(on event datasets) is particularly sensitive to granularity choice — inconsistent data points per bucket produce variable results.
Common Pitfalls
- Using
COUNTon metrics.COUNTcounts the number of metric events, not the metric value. UseAVG,SUM,MAX, orHEATMAPinstead. - Using
RATE_AVG/RATE_SUM/RATE_MAXon metrics datasets. These are not allowed. To get a rate, use a calculated field withRATE($metric)and then apply a spatial aggregation likeAVGorSUM. - Assuming all metrics share the same attributes. Each metric has its own set of
resource and data point attributes. Always call
get_dataset_columnswithmetric_nameto discover what's available for a specific metric before adding filters or breakdowns. - Confusing event datasets with the metrics dataset. Datasets named
kafka-metrics,refinery-metrics, etc. are event datasets. Checkdataset_typefromget_environment. - Querying histogram sub-fields when the parent column works. Use
P99(http.server.duration)rather thanAVG(http.server.duration.p99)unless you specifically need worst-case bounds. - Not specifying an aggregate function. Metrics queries without a spatial aggregation
in SELECT default to
COUNT, which is meaningless for metrics.
Additional Resources
Reference Files
${CLAUDE_PLUGIN_ROOT}/skills/metrics-queries/references/metrics-query-examples.md— Metrics query cookbook with run_query examples for common scenarios${CLAUDE_PLUGIN_ROOT}/skills/metrics-queries/references/temporal-aggregation.md— Deep reference on temporal aggregation functions, counter resets, and range_interval_seconds${CLAUDE_PLUGIN_ROOT}/skills/metrics-queries/references/metric-types.md— OpenTelemetry metric types, how they map to Honeycomb, and what the MetricInfo values mean
Cross-References
- For general query construction patterns (calculated fields, relational fields, result interpretation): query-patterns skill
- For investigating production issues using metrics alongside traces: production-investigation skill
- For SLO interpretation and burn alert design: slos-and-triggers skill
- For instrumenting applications to send metrics: otel-instrumentation skill
原文・著作権は Anthropic および各プラグイン作者に帰属します。日本語訳は Claude API による自動翻訳です。