スキルOfficialdevelopment

🗺️aidp-bucket-mapping

プラグイン: oracle-ai-data-platform-workbench-databricks-migrator
ソース: GitHub で見る ↗

説明

マイグレーターがノートブックおよびカタログの移行時にパスを書き換える際に使用する、`s3://` → `oci://` のバケット／ネームスペースマッピングを設定します。次のような場合に使用: - (a) ユーザーが `s3://` パスに外部テーブル／ファイルを持っており、それらを OCI Object Storage に移行する必要がある場合 - (b) `check_data_availability` が `"S3 bucket X not found in OCI bucket mapping"` というエラーを報告している場合 - (c) DDL リライターがバケット未検出の警告をログに出力している場合

原文を表示

Configure the s3:// → oci:// bucket / namespace mapping the migrator uses to rewrite paths during notebook + catalog migration. Use when (a) the user has external tables / files at s3:// paths that need to land on OCI Object Storage, OR (b) check_data_availability reports "S3 bucket X not found in OCI bucket mapping", OR (c) DDL rewriter logs a missing-bucket warning.

ユースケース

✓S3パスをOCI Object Storageに移行する
✓バケットマッピングエラーを解決する
✓DDLリライターの警告に対応する

本文（日本語訳）

`aidp-bucket-mapping` — `s3://` ↔ `oci://` の対応付け設定

マイグレーターの複数箇所では、s3://<bucket>/<path> を参照し、oci://<bucket>@<namespace>/<path> へ書き換える処理が行われます:

aidp-migrate-catalog — 外部テーブルの LOCATION 句を書き換える
aidp-migrate-job Pass-1 — ノートブックセル内の spark.read.parquet("s3://...") リテラルを書き換える
aidp-check-data — 同じ変換ロジックを使用してパスを検証する

これらすべてが、同一のバケットマッピング設定を参照しています。

次のような場合に使用

新しいワークステーションまたは新しいテナンシーの組み合わせでマイグレーターをセットアップするとき
ツールが S3 bucket "<name>" not found in OCI bucket mapping. Known buckets: [...] と報告するとき
Databricks 側の S3 バケットをミラーリングする OCI Object Storage バケットを新規プロビジョニングした後

設定ファイルについて

マイグレーターは load_bucket_mapping() ヘルパーを通じてバケットマッピングを読み込みます。ユーザーは以下の形式の JSON ファイルを用意してください（ファイルパスは --bucket-mapping オプションで変更可能です）:

{
  "buckets": {
    "<s3-bucket-name>": {
      "oci_bucket": "<oci-bucket-name>",
      "oci_namespace": "<oci-namespace>",
      "notes": "任意の備考メモ"
    },
    "<another-s3-bucket>": {
      "oci_bucket": "<another-oci-bucket>",
      "oci_namespace": "<oci-namespace>"
    }
  },
  "default_namespace": "<oci-namespace>",
  "default_region": "<oci-region>"
}

フィールド	意味
`buckets.<s3-name>.oci_bucket`	移行先の OCI Object Storage バケット名
`buckets.<s3-name>.oci_namespace`	OCI テナンシーの namespace（DataLake の namespace とは異なる場合があるため注意）
`default_namespace`	`oci://` URL に `@<ns>` が明示されていないパスを参照する際に使用
`default_region`	OCI クライアントの構築に使用

このファイルはユーザーが管理するパスに保存してください（テナンシー固有の識別子を含むため、.gitignore に追加すること）。マイグレーターの各エントリーポイントを呼び出す際は、--bucket-mapping <path> で毎回指定してください。

新規テナンシー向けマッピングの構築手順

初めてセットアップする場合は、以下の手順に従ってください:

1. Databricks ワークスペース内で参照されている S3 バケットを一覧取得する

簡易的な方法:

# マイグレーターリポジトリ上で実行:
grep -roE 's3://[a-z0-9.-]+' <source-databricks-checkout>/ | sort -u

マイグレーターの prep ヘルパーが利用可能な場合は、そちらを使用しても構いません。

2. 各 S3 バケットに対して、移行先の OCI バケット名と namespace を特定する

以下のいずれかの方針を選択してください:

ユーザーが対応する OCI バケットをプロビジョニングする（大規模移行では推奨。バケット名をそのまま維持できる）
すべてを単一の共有 OCI バケットに集約し、プレフィックスで分離する

3. OCI namespace を確認する

各 OCI テナンシーにはリージョンごとに1つの namespace が存在します。以下のコマンドで取得してください:

oci os ns get --profile <profile>
# 出力例: {"data": "<your-namespace>"}

4. JSON ファイルを作成し、config/bucket_mapping.json（またはチームのシークレット管理場所）に配置する

.gitignore への追加を忘れずに。

5. マッピングの動作を確認する

aidp-check-data を --bucket-mapping <path> 付きで再実行してください。 MISSING 扱いだった s3://... パスが oci://... に正しく解決され、検証が通るようになるはずです。

マッピングエラーの扱い: 即時失敗 vs 警告

動作の仕様:

migrate_catalog.py — --catalog-manifest を明示指定した場合、未知のバケットを即時リジェクトする（データ整合性のゲートとして機能）
job_migrate.py Pass-1 — 未知のバケットを Claude へのセル修正コンテキストとして渡す。モデルは合成スタブへルーティングするか、ユーザーに確認を求める
check_data_availability.py — 未知のバケットを MISSING の確定行として報告する

WARNINGS が表示されても移行が続行する場合、通常は安全です。これはリライターがパスを変更せずにそのまま通過させたことを意味します。該当のコンシューマーノートブックがそのパスを読み取らなくなっているか、または適切に修正済みであることを確認してください。

よくある間違い

間違い	対処方法
DataLake の namespace と OCI テナンシーの namespace を混同している	これらは異なる場合があります。マッピング内の `oci_namespace` はテナンシーの namespace（`oci os ns get` で取得）であり、DataLake 内部の namespace ではありません。
バケット名に `s3://` プレフィックスを含めている	`s3://` は不要です。バケット名のみを指定してください。
後続の呼び出しで `--bucket-mapping <path>` の指定を忘れている	このパスは実行ごとに指定が必要であり、自動的には保存されません。`aidp-migrate-job` / `aidp-migrate-catalog` の呼び出しにも必ず追加してください。
実際には読み取り権限のないバケットをマッピングに登録している	マッピングは名前の解決のみを行います。アクセスエラーは最初の読み取り時に表面化します。アクセスできないバケットをダミーで登録しないでください。

次のステップ

aidp-check-data を再実行してください。バケットマッピングの問題に起因していた MISSING 行が OK に変わるはずです。
aidp-migrate-job / aidp-migrate-catalog に進んでください。

原文（English）を表示

`aidp-bucket-mapping` — wire up `s3://` ↔ `oci://`

Several places in the migrator look up s3://<bucket>/<path> and rewrite to oci://<bucket>@<namespace>/<path>:

aidp-migrate-catalog rewrites external-table LOCATION clauses.
aidp-migrate-job Pass-1 rewrites spark.read.parquet("s3://...") literals in notebook cells.
aidp-check-data probes paths via the same translation.

All consult the same bucket-mapping config.

When to use

Setting up the migrator on a new workstation / new tenancy combo.
Any time a tool reports S3 bucket "<name>" not found in OCI bucket mapping. Known buckets: [...].
After provisioning a new OCI Object Storage bucket that mirrors a Databricks-side S3 bucket.

The config file

The migrator loads bucket mappings via the load_bucket_mapping() helper. The customer supplies a JSON file with this shape (file path is configurable via --bucket-mapping):

{
  "buckets": {
    "<s3-bucket-name>": {
      "oci_bucket": "<oci-bucket-name>",
      "oci_namespace": "<oci-namespace>",
      "notes": "optional human note"
    },
    "<another-s3-bucket>": {
      "oci_bucket": "<another-oci-bucket>",
      "oci_namespace": "<oci-namespace>"
    }
  },
  "default_namespace": "<oci-namespace>",
  "default_region": "<oci-region>"
}

Field	Meaning
`buckets.<s3-name>.oci_bucket`	Target OCI Object Storage bucket.
`buckets.<s3-name>.oci_namespace`	OCI tenancy namespace (NOT the DataLake namespace — these can differ).
`default_namespace`	Used when a path references an `oci://` URL without an explicit `@<ns>`.
`default_region`	Used to construct the OCI client.

Save the file to a path the user controls (gitignored — it contains tenancy-specific identifiers). Pass via --bucket-mapping <path> to every migrator entrypoint.

Building the mapping for a new tenancy

If the user is doing this for the first time:

List the source S3 buckets referenced in the Databricks workspace. Quick way:
```
# On the migrator repo:
grep -roE 's3://[a-z0-9.-]+' <source-databricks-checkout>/ | sort -u
```
Or use the migrator's prep helper if available.
For each, identify the target OCI bucket + namespace. Either:
- The user provisions matching OCI buckets (recommended for big migrations — preserves bucket names).
- The user routes everything into a single shared OCI bucket with prefix isolation.
Find the OCI namespace. Each OCI tenancy has ONE namespace per region — find it via:
```
oci os ns get --profile <profile>
# returns: {"data": "<your-namespace>"}
```
Write the JSON and place it at config/bucket_mapping.json (or wherever your team stores secrets). Make sure it's gitignored.
Test the mapping by re-running aidp-check-data with --bucket-mapping <path>. Any MISSING entries with s3://... paths now resolve to oci://... and probe correctly.

When the mapping should fail FAST (vs warn)

Behavior contract:

migrate_catalog.py REJECTS unknown buckets when --catalog-manifest is explicit (data-correctness gate).
job_migrate.py Pass-1 surfaces unknown buckets in the cell-fix Claude context so the model can either route to a synthetic stub OR ask the user.
check_data_availability.py reports unknown-bucket as a hard MISSING row.

If you see WARNINGS but the migration continues, that's usually safe — the rewriter passed the path through unchanged. Confirm the consumer notebook either no longer reads that path OR has been adapted.

Common mistakes

Mistake	Fix
Confusing DataLake namespace vs OCI tenancy namespace	These can differ. `oci_namespace` in the mapping is the TENANCY namespace (`oci os ns get`), not the DataLake's internal namespace.
Hardcoding bucket names that include `s3://` prefix	Don't include `s3://` — just the bucket name.
Forgetting to pass `--bucket-mapping <path>` to subsequent invocations	The path is per-run, not persisted. Add it to your `aidp-migrate-job` / `aidp-migrate-catalog` invocations.
Listing buckets the user doesn't actually have read access to	The mapping resolves the name; access errors surface at first read. Don't pre-mock buckets the user can't touch.

After this

Re-run aidp-check-data — any MISSING rows that were due to bucket-map issues should now be OK.
Proceed to aidp-migrate-job / aidp-migrate-catalog.

原文・著作権は Anthropic および各プラグイン作者に帰属します。日本語訳は Claude API による自動翻訳です。