スキルOfficialdevelopment

🔍querying-data-lake

プラグイン: aws-data-analytics
引数: [SQL-query|query-name|workgroup-name|catalog-name|'profile TABLE_NAME']
ソース: GitHub で見る ↗

説明

デフォルトおよびフェデレーテッドカタログ（Glue、S3 Tables、Redshift）を横断して、Athena SQLクエリを実行・管理します。次のような場合に使用: 「データをクエリする」「SQLを実行する」「Athenaクエリ」「テーブルを分析する」「SQLクエリ」「ワークグループのステータス」「テーブルのプロファイリング」「Redshiftカタログをクエリする」「S3 Tablesをクエリする」といったフレーズが含まれる場合。 **使用しないケース:** - 特定のデータアセットの検索 → `finding-data-lake-assets` を使用 - カタログ全体の監査 → `exploring-data-catalog` を使用 - データのインポート → `ingesting-into-data-lake` を使用

原文を表示

Execute and manage Athena SQL queries across default and federated catalogs (Glue, S3 Tables, Redshift). Triggers on phrases like: query data, run SQL, athena query, analyze table, SQL query, workgroup status, profile table, query Redshift catalog, query S3 Tables. Do NOT use for finding specific data assets (use finding-data-lake-assets), full catalog audits (use exploring-data-catalog), importing data (use ingesting-into-data-lake).

ユースケース

✓複数カタログを横断してSQLクエリを実行する
✓Athenaでテーブルを分析する
✓Redshiftカタログをクエリする
✓S3 Tablesをクエリする
✓ワークグループのステータスを確認する

本文（日本語訳）

データレイクのクエリ実行

デフォルトカタログおよびフェデレーテッドカタログ（Glue、S3 Tables、Redshift）に対して Amazon Athena 上で SQL クエリを実行します。ワークグループの選択、ステートメントの分類、エラーリカバリに対応しています。

概要

デフォルトカタログおよびフェデレーテッドカタログをまたいで Athena SQL クエリを実行・管理します。ワークグループを選択し、対象アセットを解決（あいまいな参照は finding-data-lake-assets に委譲）し、ステートメントの安全性を分類し、コストとスキャンデータ量を報告します。サンドボックス実行および監査ログには AWS MCP サーバーを使用してください。MCP サーバーが利用できない場合は、同等の AWS CLI コマンドを直接実行できます。

次のような場合に使用: SQL テキスト、名前付きクエリ名、ワークグループ名、カタログ名、または profile TABLE_NAME のいずれかを引数として渡してクエリを実行したいとき。

パラメータ取得に関する制約:

オプションの引数を1つだけ受け付けること（SQL テキスト、名前付きクエリ名、ワークグループ名、カタログ名、または profile TABLE_NAME）
引数は直接テキストとして、またはSQLを含むファイルへのポインタとして受け付けること
対象の AWS リージョンがまだ設定されていない場合は、ユーザーに確認すること
非トリビアルなクエリを実行する前に、出力先 S3 の場所をユーザーに確認すること
どのステップでもユーザーが中断を決定した場合はそれに従うこと

共通タスク

1. 依存関係の確認

クエリ実行前に、必要なツールと AWS アクセスを確認します。

制約:

AWS MCP サーバーのツール（aws___call_aws）が利用可能かどうかを確認し、利用可能な場合はそれを通じてクエリを実行すること。MCP サーバーが利用できない場合のみ AWS CLI にフォールバックすること
クエリ実行において shell や Bash へのフォールバックは禁止 — 出力場所とコストを追跡するため、結果は必ず MCP ツールまたは aws athena CLI 経由で取得すること
aws sts get-caller-identity でクレデンシャルを確認し、不足しているツールがある場合はユーザーに通知すること

2. ワークグループの解決

呼び出し元 ID を確認し、ワークグループを一覧表示して、最適なものを自動選択します（workgroup-selection.md 参照）。

制約:

クエリを送信する前に必ずワークグループを選択すること（出力場所エラーを防止するため）
選択したワークグループとその出力場所をユーザーに提示すること
失敗時に別のワークグループへの自動エスカレーションをユーザー確認なしに行ってはならない

3. 対象アセットの解決

ユーザーがテーブルを名前、ビジネス概念（「四半期レポート」「販売データ」など）、S3 パス、またはテーブルを指定せずカタログのみで参照している場合は、finding-data-lake-assets に委譲して具体的な database.table（デフォルト以外の場合はカタログも含む）を返します。

制約:

あいまいなアセット参照の解決に athena list-data-catalogs を使用したり get-tables を反復したりしてはならない — フェデレーテッドカタログが漏れ、トークンの無駄遣いになるため
ユーザーが完全修飾参照（正確な database.table）またはそのまま実行したい生の SQL を提供した場合のみ、このステップをスキップしてよい
クエリを構築する前に解決したアセットを明示的に提示すること:「[catalog] の [table] を見つけました。このテーブルをクエリに使用します。」
ユーザーが「フェデレーテッド」「Redshift」「S3 Tables」に言及するか、finding-data-lake-assets が別のカタログを返した場合を除き、デフォルトの Glue カタログを使用すること

4. スキーマの探索

分析クエリを実行する場合は、最終クエリを構築する前に対象テーブルをプロファイリングしてください。プロファイリングの一環として、サンプル行（SELECT ... LIMIT 5）を必ず表示すること。

5. クエリの構築

テーブルの参照方法はカタログの種類によって異なります:

デフォルト Glue カタログ: database.table（単一カタログのクエリではカタログプレフィックスを省略）。クロスカタログクエリでは、デフォルトカタログのテーブルを "awsdatacatalog".database.table と修飾する
登録済みデータソース: datasource.database.table
未登録の Glue カタログ: "catalog/subcatalog".database.table

6. 分類と実行

SQL ステートメントを実行前に分類します:

ステートメント	動作
`SELECT`、`SHOW`、`DESCRIBE`、`EXPLAIN`	安全 — そのまま実行
`INSERT`、`UPDATE`、`DELETE`、`DROP`、`ALTER`、`CREATE`、`TRUNCATE`、`MERGE`	破壊的操作 — ユーザーに警告し、明示的な確認を求める
判断できない場合	破壊的操作として扱い、確認する

ツール呼び出しの例（AWS MCP サーバー経由）:

aws___call_aws(command="aws athena start-query-execution --work-group <WORKGROUP_NAME> --query-string '<sql>' --query-execution-context Database=<db>")

フェデレーテッドカタログまたは S3 Tables カタログの場合は、実行コンテキストに Catalog=<CATALOG_PATH> も設定すること（例: Catalog=s3tablescatalog/<BUCKET_NAME>）。

制約:

Redshift フェデレーテッドが対象の場合は実行前にユーザーへ警告すること（「パーティションプルーニングが効かないため、クエリごとにテーブル全体をスキャンします」）
クロスカタログ結合を実行する場合は実行前にユーザーへ警告すること（「クロスカタログ結合はネットワークオーバーヘッドが発生し、処理が遅くなる可能性があります」）
実行前に出力先 S3 の場所を確認すること
実行前にどのツールを呼び出すかをユーザーに説明すること
ユーザーが中断を決定した場合はそれに従うこと

7. 結果の提示とリカバリ

コスト、スキャンデータ量、処理時間、実用的なインサイトとともに結果を提示します。失敗した場合は利用可能なワークグループを一覧表示し、どのワークグループで再試行するかをユーザーが選択できるようにします。

引数のルーティング

以下の順序で解決し、最初に一致した時点で停止します:

SQL キーワード（SELECT、SHOW、DESCRIBE、INSERT など）を含む → SQL テキストとして直接実行
profile TABLE_NAME → 包括的なテーブルプロファイリングを実行（query-patterns.md 参照）
既知の名前付きクエリに一致する → 検索して実行
既知のワークグループに一致する → ワークグループのステータスと最近のクエリを表示
既知のカタログに一致する → exploring-data-catalog に委譲してデータベースとテーブルを列挙
引数なし → 最近のクエリアクティビティと利用可能なテーブルを表示

原則

実行前には必ずワークグループを選択する（出力場所エラーを防止するため）
分析クエリを実行する前に、慣れていないテーブルはプロファイリングする
ユーザーがコスト意識を持てるよう、結果とともにコストを提示する
大きなテーブルに対する探索的クエリには LIMIT を提案する
明らかな答えがあるドメイン上の質問はしないが、セキュリティに関わる操作（ワークグループの切り替え、出力場所の変更、SELECT 以外のステートメントの実行）は必ず確認する

トラブルシューティング

エラー	原因	対処法
大文字小文字混在による Redshift 識別子エラー	Redshift フェデレーテッドの名前は小文字のみ	識別子を小文字に変換する
`CatalogId` バリデーションエラー	ARN がカタログ名の代わりに渡された	ARN ではなくカタログ名を渡す
クロスカタログの `information_schema` が何も返さない	カタログ修飾子の欠落	カタログ修飾パスを使用する: `"catalog".information_schema.tables`
出力場所エラーでクエリ失敗	ワークグループに出力場所が設定されていない	出力場所が設定された別のワークグループを選択するか、出力場所を設定する
確認なしに破壊的ステートメントが実行された	ステートメント分類がスキップされた	`INSERT`/`UPDATE`/`DELETE`/`DROP`/`ALTER`/`CREATE`/`TRUNCATE`/`MERGE` は必ず分類し、ユーザーに確認を求める

追加リソース

原文（English）を表示

Query Data Lake

Execute SQL queries on Amazon Athena across default and federated catalogs (Glue, S3 Tables, Redshift) with workgroup selection, statement classification, and error recovery.

Overview

Executes and manages Athena SQL queries across default and federated catalogs. Selects a workgroup, resolves target assets (delegating fuzzy references to finding-data-lake-assets), classifies statements for safety, and reports cost and data scanned. Use the AWS MCP server for sandboxed execution and audit logging; the same AWS CLI commands work directly when the MCP server is not available.

Constraints for parameter acquisition:

You MUST accept a single optional argument: SQL text, a named-query name, a workgroup name, a catalog name, or profile TABLE_NAME
You MUST accept the argument as direct text or a pointer to a file containing SQL
You MUST ask the user for the target AWS region if not already set
You MUST confirm the output S3 location before executing any non-trivial query
You MUST respect the user's decision to abort at any step

Common Tasks

1. Verify Dependencies

Check for required tools and AWS access before running queries.

Constraints:

You MUST verify AWS MCP server tools are available (aws___call_aws) and run queries through them when present; fall back to AWS CLI only if the MCP server is unavailable
You MUST NOT fall back to shell or Bash for query execution — results must be captured via the MCP tool or aws athena CLI so output location and cost are tracked
You MUST confirm credentials with aws sts get-caller-identity and inform the user about any missing tools

2. Resolve Workgroup

Check caller identity, list workgroups, auto-select the best one (see workgroup-selection.md).

Constraints:

You MUST select a workgroup before submitting any query (prevents output-location errors)
You MUST present the selected workgroup and its output location to the user
You MUST NOT auto-escalate to a different workgroup on failure without user confirmation

3. Resolve the Target Asset

If the user refers to a table by name, by business concept ("our quarterly report", "the sales data"), by S3 path, or by catalog without specifying the table, delegate to finding-data-lake-assets to return the concrete database.table (and catalog if non-default).

Constraints:

You MUST NOT attempt to resolve fuzzy asset references with athena list-data-catalogs or by iterating get-tables — those miss federated catalogs and waste tokens
You SHOULD skip this step only when the user provides a fully-qualified reference (exact database.table) or raw SQL they want executed as-is
You MUST state the resolved asset explicitly before building the query: "Found [table] in [catalog]. Using this for the query."
You SHOULD default to the default Glue catalog unless the user mentions "federated", "Redshift", "S3 Tables", or finding-data-lake-assets returns a different catalog

4. Discover Schema

For analytical queries, You SHOULD profile the target table before building the final query. You MUST show sample rows (SELECT ... LIMIT 5) as part of profiling.

5. Build Query

Table addressing depends on catalog type:

Default Glue catalog: database.table (omit the catalog prefix for single-catalog queries). In cross-catalog queries, qualify default-catalog tables with "awsdatacatalog".database.table.
Registered data source: datasource.database.table
Unregistered Glue catalog: "catalog/subcatalog".database.table

6. Classify and Execute

Classify the SQL statement before executing:

Statement	Behavior
`SELECT`, `SHOW`, `DESCRIBE`, `EXPLAIN`	Safe — execute
`INSERT`, `UPDATE`, `DELETE`, `DROP`, `ALTER`, `CREATE`, `TRUNCATE`, `MERGE`	Destructive — warn the user and require explicit confirmation
Unsure	Treat as destructive; confirm

Example tool call (via AWS MCP server):

aws___call_aws(command="aws athena start-query-execution --work-group <WORKGROUP_NAME> --query-string '<sql>' --query-execution-context Database=<db>")

For federated or S3 Tables catalogs, also set Catalog=<CATALOG_PATH> in the execution context (e.g. Catalog=s3tablescatalog/<BUCKET_NAME>).

Constraints:

You MUST warn the user before executing when the target is Redshift-federated ("No partition pruning — every query scans the full table")
You MUST warn the user before executing a cross-catalog join ("Cross-catalog joins incur network overhead and may be slow")
You MUST confirm the output S3 location before executing
You MUST explain which tool is being called before executing
You MUST respect the user's decision to abort

7. Present and Recover

Present results with cost, data scanned, duration, and actionable insights. On failure, list available workgroups and let the user choose which to retry with.

Argument Routing

Resolve in this order; stop at the first match:

Contains SQL keywords (SELECT, SHOW, DESCRIBE, INSERT, etc.) — SQL text, execute directly
profile TABLE_NAME — run comprehensive table profiling (see query-patterns.md)
Matches a known named query — look up and execute
Matches a known workgroup — show workgroup status and recent queries
Matches a known catalog — delegate to exploring-data-catalog to enumerate databases and tables
No args — show recent query activity and available tables

Principles

Always select workgroup before executing (prevents output-location errors)
Profile unfamiliar tables before running analytical queries
Present cost alongside results so users build cost awareness
Suggest LIMIT for exploratory queries on large tables
Never ask domain questions with obvious answers, but always confirm security-relevant actions (workgroup switches, output location changes, non-SELECT statements)

Troubleshooting

Error	Cause	Fix
Redshift identifier error with mixed case	Redshift-federated names are lowercase only	Lowercase the identifier
`CatalogId` validation failure	ARN passed instead of catalog name	Pass the catalog name, not the ARN
Cross-catalog `information_schema` returns nothing	Missing catalog qualifier	Use catalog-qualified path: `"catalog".information_schema.tables`
Query fails with output-location error	Workgroup has no output location configured	Select a different workgroup with an output location, or configure one
Destructive statement executed without confirmation	Statement classification skipped	Always classify `INSERT`/`UPDATE`/`DELETE`/`DROP`/`ALTER`/`CREATE`/`TRUNCATE`/`MERGE` and confirm with the user

Additional Resources

原文・著作権は Anthropic および各プラグイン作者に帰属します。日本語訳は Claude API による自動翻訳です。