スキルOfficialdevelopment

🏗️creating-data-lake-table

プラグイン: aws-data-analytics
引数: [table-description|schema-spec]
ソース: GitHub で見る ↗

説明

Amazon S3 Tables（s3tables APIネームスペース）を使用して、自動コンパクションとスナップショット管理を備えたマネージド Iceberg テーブルを作成します。テーブルバケット、ネームスペース、テーブル、スキーマ、Glue カタログへの登録、パーティショニング、IAM アクセス制御のセットアップを行います。次のような場合に使用: テーブルの作成、データレイクテーブル、アナリティクステーブル、構造化データストレージ、S3 Tables、Iceberg、Athena テーブル、パーティショニング戦略、アクセス権限の設定 **以下の用途には使用しないでください:** - ファイルのインポート → `ingesting-into-data-lake` を使用 - ベクターストレージ → `storing-and-querying-vectors` を使用 - 既存テーブルへのクエリ実行 → `querying-data-lake` を使用 - 既存テーブルの検索・特定 → `finding-data-lake-assets` を使用

原文を表示

Create managed Iceberg tables using Amazon S3 Tables (s3tables API namespace) with automatic compaction and snapshot management. Sets up table bucket, namespace, table, schema, Glue catalog registration, partitioning, IAM access control. Triggers on: create table, data lake table, analytics table, structured data storage, S3 Tables, Iceberg, Athena table, partitioning strategy, access permissions. Do NOT use for: importing files (use ingesting-into-data-lake), vector storage (use storing-and-querying-vectors), querying existing tables (use querying-data-lake), or locating existing table (use finding-data-lake-assets).

ユースケース

✓マネージド Iceberg テーブルを作成するとき
✓データレイクのテーブル構造をセットアップするとき
✓S3 でパーティショニング戦略を実装するとき
✓IAM アクセス制御を設定するとき
✓Glue カタログへテーブルを登録するとき

本文（日本語訳）

Amazon S3 Tables でデータレイクテーブルを作成する

概要

Amazon S3 Tables は、自動コンパクションとスナップショット管理を備えたマネージド Iceberg テーブルを提供します。 Athena および Iceberg 互換エンジンからクエリ可能です。

共通タスク

MCP サーバーツールに接続されている場合は、必ず AWS MCP サーバーツールを使用してください。これらはコマンドバリデーション、サンドボックス実行、および監査ログを提供します。 MCP が利用できない場合は AWS CLI にフォールバックしてください。

判断ガイド

作成前に、必ず既存リソースを確認してください:

ユーザーがデータベースに言及した場合は、必ず aws glue get-tables --database-name <NAME> を実行してください。

確認結果	アクション
あいまいなデータベース名（「うちの分析用 DB」など）	必ず停止すること。 `finding-data-lake-assets` に委任して解決する。
同名の非 S3 Tables テーブルが存在する	必ず停止すること。 `finding-data-lake-assets` に委任する。ユーザーが確認するまで作成してはならない。
同名の S3 Tables テーブルが既に存在する	必ずスキーマの一致を確認すること。互換性があれば再利用し、再作成はユーザーが確認した場合のみ行う。
一致するテーブルが存在しない	作成に進む（ステップ 1〜8）。
ユーザーが新規 S3 Tables テーブルを明示的に要求している	確認をスキップし、作成に進む。

作成パス:

S3 上の既存データ: 空テーブルを作成（ステップ 1〜8）した後、ingesting-into-data-lake スキルを使用する。
Glue ETL パイプライン: 先に references/table-creation-glue-etl.md を参照し、その後ステップ 1〜6 を実行する。
Lake Formation アクセス制御: AWS ドキュメントで "S3 Tables integration with Lake Formation" を検索する。

1. 依存関係の確認

制約:

AWS MCP サーバーツールまたは AWS CLI が利用可能かどうかを必ず確認し、利用できない場合はユーザーに通知すること
ターゲットの AWS リージョンを必ず確認し、aws sts get-caller-identity でクレデンシャルを検証すること

2. スキーマの把握

明示的なスキーマがある場合: Iceberg 型を検証する。
大まかな説明しかない場合: カラム、型、粒度を確認する。案を提示し、確認を取る。
S3 上の既存データがある場合: ファイルのヘッダーのみからスキーマを推論する。まず空テーブルを作成し、その後 ingesting-into-data-lake スキルを使用する。

制約:

Iceberg 型のマッピング、パーティション、命名規則については、必ず references/best-practices.md を参照すること。
テーブル名、カラム、型、パーティション戦略など、必要なパラメータをすべて事前に確認すること。スキーマの変更については references/athena-ddl-path.md を参照。
名前はすべて小文字を使用すること — Glue は大文字混じりの名前を GENERIC_INTERNAL_ERROR で拒否する。Namespace およびテーブル名にハイフンを含めてはならない。
アクセスパターンに基づいてパーティションカラムを提案することが推奨される。

3. テーブルバケットの作成

名前の制約: 3〜63 文字、小文字、数字、ハイフンのみ使用可能。

aws s3tables create-table-bucket --name <BUCKET_NAME> --region <REGION>

table-bucket-arn を取得して保存すること。暗号化（デフォルトは SSE-S3、SSE-KMS も可）およびストレージクラス（STANDARD、INTELLIGENT_TIERING）は作成時に設定します。 references/best-practices.md を参照してください。

制約:

必ず aws s3tables list-table-buckets で既存バケットを確認し、既存バケットの選択または新規作成についてユーザーに確認すること。
SSE-KMS を使用する場合、KMS キーポリシーは S3 Tables メンテナンスサービスプリンシパルによるデータ読み取りを許可しなければならない。必要なポリシーについては AWS ドキュメントで "S3 Tables KMS key policy" を検索すること。
バケット作成に失敗した場合は、references/best-practices.md で一般的なエラーを確認すること。

4. Namespace の作成

aws s3tables create-namespace --table-bucket-arn <ARN> --namespace <NAMESPACE>

制約:

必ず先に既存の Namespace を一覧表示し、関連するものがあれば再利用を提案すること
小文字かつハイフンなしの名前を使用すること

5. Glue Data Catalog インテグレーションの設定

s3tablescatalog が存在するか確認します（アカウントおよびリージョンごとに 1 回のみ作成）:

aws glue get-catalog --catalog-id s3tablescatalog

見つからない場合は作成します（glue:CreateCatalog、glue:passConnection 権限が必要）:

aws glue create-catalog --name "s3tablescatalog" --catalog-input '{
  "FederatedCatalog": {
    "Identifier": "arn:aws:s3tables:<REGION>:<ACCOUNT_ID>:bucket/*",
    "ConnectionName": "aws:s3tables"
  },
  "CreateDatabaseDefaultPermissions": [{"Principal": {"DataLakePrincipalIdentifier": "IAM_ALLOWED_PRINCIPALS"}, "Permissions": ["ALL"]}],
  "CreateTableDefaultPermissions": [{"Principal": {"DataLakePrincipalIdentifier": "IAM_ALLOWED_PRINCIPALS"}, "Permissions": ["ALL"]}],
  "AllowFullTableExternalDataAccess": "True"
}'

aws glue get-catalogs --parent-catalog-id s3tablescatalog で確認してください。

6. アクセス制御の設定

S3 Tables は s3:* ではなく s3tables:* の IAM 名前空間を使用します。

クエリプリンシパルの権限（バケットポリシー）:

s3tables:GetTableBucket、s3tables:GetNamespace、s3tables:GetTable、s3tables:GetTableMetadataLocation、s3tables:GetTableData

クエリプリンシパルの権限（IAM ポリシー）:

glue:GetCatalog、glue:GetDatabase、glue:GetTable

正しい ARN パターンにスコープを絞ること。正確なリソース ARN については必ず references/access-control.md を参照すること。

制約:

クエリプリンシパルの ARN をユーザーに必ず確認すること
必要以上に広い権限を付与してはならない
IAM ロールを自動的に作成してはならない。既存のロールを確認し、ユーザーをガイドすること

7. テーブルの作成

状況	パス
デフォルト（一般ユーザー）	S3 Tables API（以下参照）
ユーザーが SQL DDL を明示的に希望する場合	Athena DDL（`references/athena-ddl-path.md` 参照）
Glue ETL パイプラインの場合	`--conf` ジョブ引数経由の Spark DDL（`spark.conf.set()` は使用不可）。`--conf` 文字列については必ず `references/table-creation-glue-etl.md` を参照すること。

デフォルト: S3 Tables API:

aws s3tables create-table \
  --table-bucket-arn <ARN> \
  --namespace <NAMESPACE> \
  --name <TABLE_NAME> \
  --format ICEBERG \
  --metadata '<METADATA_JSON>'

メタデータ JSON は必ず "iceberg" キーの下にネストすること:

{"iceberg":{"schema":{"fields":[
  {"name":"order_date","type":"date","required":true},
  {"name":"customer_id","type":"string","required":true},
  {"name":"amount","type":"double","required":false}
]},
"partitionSpec":{"fields":[
  {"sourceId":1,"fieldId":1000,"transform":"month","name":"order_date_month"}
]}}}

制約:

partitionSpec.sourceId は有効なスキーマフィールド ID を参照しなければならない
作成後のスキーマ変更には Athena DDL を使用すること。references/athena-ddl-path.md を参照。
複合型（list、map、struct）には、明示的なフィールド ID を持つ schemaV2 を必ず使用すること。references/best-practices.md を参照。
サポートされているパーティション変換については、AWS ドキュメントで "IcebergPartitionField S3 Tables" を検索することが推奨される

8. 検証と確認

aws s3tables get-table で必ず検証し、--query-execution-context '{"Catalog":"s3tablescatalog/<BUCKET_NAME>","Database":"<NAMESPACE>"}' を指定した Athena 経由で DESCRIBE <table_name> を実行してクエリ可能であることを確認すること。カタログは SQL 文内に記述しないこと。最後に以下の情報を要約として提示すること: バケット ARN、Namespace、テーブル、スキーマ、パーティション。

トラブルシューティング

エラー	原因	対処法
"Table location can not be specified"	CREATE TABLE に LOCATION 句が含まれている	LOCATION 句を削除すること。S3 Tables はストレージを自動管理する。
`s3:*` ポリシーによる `AccessDeniedException`	`s3tables:` ではなく `s3:` を使用している	S3 Tables は `s3tables:*` 名前空間を使用する。IAM ポリシーを更新すること。

追加リソース

access-control.md — IAM 権限、ARN パターン、権限エラー
best-practices.md — Iceberg 型、パーティション、命名規則、一般的なエラー
athena-ddl-path.md — Athena DDL、スキーマ変更
table-creation-glue-etl.md — Glue ETL 経由の Spark DDL
データの読み込み: ingesting-into-data-lake スキル

原文（English）を表示

Create Data Lake Tables with Amazon S3 Tables

Overview

Amazon S3 Tables provides managed Iceberg tables with automatic compaction and snapshot management. Queryable via Athena and Iceberg-compatible engines.

Common Tasks

You MUST use AWS MCP server tools when connected, they provide command validation, sandboxed execution, and audit logging. Fall back to AWS CLI if MCP unavailable.

Decision Guide

Before creating, You MUST check what exists:

You MUST run aws glue get-tables --database-name <NAME> when user mentions a database.

What you find	Action
Fuzzy database name ("our analytics db")	You MUST STOP. Delegate to `finding-data-lake-assets` to resolve.
Non-S3-Tables table with matching name	You MUST STOP. Delegate to `finding-data-lake-assets`. You MUST NOT create until user confirms.
Existing S3 Tables table with matching name	You MUST check schema match. Reuse if compatible, recreate only if user confirms.
No matching tables	Proceed with creation (Steps 1-8).
User explicitly requests new S3 Tables table	Skip checks, proceed with creation.

Creation paths:

Existing data in S3: Create empty table (Steps 1-8), then use ingesting-into-data-lake skill.
Glue ETL pipeline: Read references/table-creation-glue-etl.md first, then Steps 1-6.
Lake Formation access control: Search AWS docs for "S3 Tables integration with Lake Formation".

1. Verify Dependencies

Constraints:

You MUST check whether AWS MCP server tools or AWS CLI are available and inform user if missing
You MUST confirm target AWS region and verify credentials with aws sts get-caller-identity

2. Understand the Schema

Explicit schema: Validate Iceberg types.
Loose description: Ask columns, types, grain. Propose and confirm.
Existing S3 data: Infer schema from file headers only. Create empty table first, then use ingesting-into-data-lake skill.

Constraints:

You MUST read references/best-practices.md for Iceberg type mapping, partitions, and naming.
You MUST ask for all required parameters upfront: table name, columns, types, partition strategy. For schema evolution, see references/athena-ddl-path.md.
You MUST use all lowercase names -- Glue rejects mixed case with GENERIC_INTERNAL_ERROR. Namespace and table names MUST NOT contain hyphens.
You SHOULD suggest partition columns based on access patterns.

3. Create Table Bucket

Names: 3-63 chars, lowercase, numbers, hyphens.

aws s3tables create-table-bucket --name <BUCKET_NAME> --region <REGION>

Capture table-bucket-arn. Encryption (SSE-S3 default, SSE-KMS) and storage class (STANDARD, INTELLIGENT_TIERING) set at creation. See references/best-practices.md.

Constraints:

You MUST check existing buckets with aws s3tables list-table-buckets and ask user to select or create new.
If using SSE-KMS, KMS key policy MUST allow S3 Tables maintenance service principal to read data. Search AWS docs for "S3 Tables KMS key policy" for required policy.
If bucket creation fails, see references/best-practices.md for common errors.

4. Create Namespace

aws s3tables create-namespace --table-bucket-arn <ARN> --namespace <NAMESPACE>

Constraints:

You MUST list existing namespaces first and suggest reusing if relevant
You MUST use lowercase names with no hyphens

5. Create Glue Data Catalog Integration

Check if s3tablescatalog exists (create once per region per account):

aws glue get-catalog --catalog-id s3tablescatalog

If not found, create (requires glue:CreateCatalog, glue:passConnection):

aws glue create-catalog --name "s3tablescatalog" --catalog-input '{
  "FederatedCatalog": {
    "Identifier": "arn:aws:s3tables:<REGION>:<ACCOUNT_ID>:bucket/*",
    "ConnectionName": "aws:s3tables"
  },
  "CreateDatabaseDefaultPermissions": [{"Principal": {"DataLakePrincipalIdentifier": "IAM_ALLOWED_PRINCIPALS"}, "Permissions": ["ALL"]}],
  "CreateTableDefaultPermissions": [{"Principal": {"DataLakePrincipalIdentifier": "IAM_ALLOWED_PRINCIPALS"}, "Permissions": ["ALL"]}],
  "AllowFullTableExternalDataAccess": "True"
}'

Verify with aws glue get-catalogs --parent-catalog-id s3tablescatalog.

6. Configure Access Control

S3 Tables uses s3tables:* IAM namespace (not s3:*).

Querying principal permissions (bucket policy):

s3tables:GetTableBucket, s3tables:GetNamespace, s3tables:GetTable, s3tables:GetTableMetadataLocation, s3tables:GetTableData

Querying principal permissions (IAM policy):

glue:GetCatalog, glue:GetDatabase, glue:GetTable

You MUST scope to correct ARN patterns. You MUST read references/access-control.md for exact resource ARNs.

Constraints:

You MUST ask user for querying principal ARN
You MUST NOT grant broader permissions than necessary
You MUST NOT create IAM roles automatically, verify existing and guide user

7. Create the Table

Context	Path
Default (any user)	S3 Tables API (below)
User specifically wants SQL DDL	Athena DDL (see `references/athena-ddl-path.md`)
Glue ETL pipeline	Spark DDL via `--conf` job args (not `spark.conf.set()`). You MUST read `references/table-creation-glue-etl.md` for the `--conf` string.

Default: S3 Tables API:

aws s3tables create-table \
  --table-bucket-arn <ARN> \
  --namespace <NAMESPACE> \
  --name <TABLE_NAME> \
  --format ICEBERG \
  --metadata '<METADATA_JSON>'

Metadata JSON MUST nest under "iceberg" key:

{"iceberg":{"schema":{"fields":[
  {"name":"order_date","type":"date","required":true},
  {"name":"customer_id","type":"string","required":true},
  {"name":"amount","type":"double","required":false}
]},
"partitionSpec":{"fields":[
  {"sourceId":1,"fieldId":1000,"transform":"month","name":"order_date_month"}
]}}}

Constraints:

partitionSpec.sourceId MUST reference a valid schema field ID
For schema evolution after creation, use Athena DDL. See references/athena-ddl-path.md
You MUST use schemaV2 for complex types (list, map, struct) with explicit field IDs. See references/best-practices.md.
You SHOULD search AWS docs for "IcebergPartitionField S3 Tables" for supported partition transforms

8. Verify and Confirm

You MUST verify with aws s3tables get-table and confirm queryability with DESCRIBE <table_name> via Athena using --query-execution-context '{"Catalog":"s3tablescatalog/<BUCKET_NAME>","Database":"<NAMESPACE>"}'. Do NOT put catalog in SQL. Present summary: bucket ARN, namespace, table, schema, partitions.

Troubleshooting

Error	Cause	Fix
"Table location can not be specified"	LOCATION in CREATE TABLE	Remove LOCATION clause. S3 Tables manages storage automatically.
`AccessDeniedException` with `s3:*` policy	Using `s3:` not `s3tables:`	S3 Tables uses `s3tables:*` namespace. Update IAM policy.

Additional Resources

access-control.md -- IAM permissions, ARN patterns, permission errors
best-practices.md -- Iceberg types, partitions, naming, common errors
athena-ddl-path.md -- Athena DDL, schema evolution
table-creation-glue-etl.md -- Spark DDL via Glue ETL
Loading data: ingesting-into-data-lake skill

原文・著作権は Anthropic および各プラグイン作者に帰属します。日本語訳は Claude API による自動翻訳です。