スキルOfficialdevelopment

🗺️aidp-migrator-overview

プラグイン: oracle-ai-data-platform-workbench-databricks-migrator
ソース: GitHub で見る ↗

説明

ルータースキル。ユーザーがDatabricksのワークロード（ノートブック、ジョブ、カタログ、スケジュール）をOracle AI Data Platform（AIDP）へ移行する話題を挙げた際は、必ず最初にこのスキルを参照すること。利用可能なツールキット、2パス移行アーキテクチャの概要、および各フェーズを担当する他の `aidp-*` スキルの対応関係を定義する。それらのスキルを組み合わせて使用すること。このスキル自体はAPIサーフェスを持たない。

原文を表示

Router skill. Read this first whenever the user mentions migrating a Databricks workload (notebooks, jobs, catalogs, schedules) onto Oracle AI Data Platform (AIDP). Lays out the toolkit, the two-pass migration architecture, and which of the other aidp-* skills handles each phase. Compose those skills; this one adds no API surface.

ユースケース

✓DatabricksワークロードをOracle AIDPへ移行する話題が挙がった
✓利用可能なツールキットを確認する
✓2パス移行アーキテクチャの概要を把握する
✓移行フェーズと担当スキルの対応関係を確認する

本文（日本語訳）

`aidp-migrator-overview` — ルーター

DatabricksワークロードをAIDPへ移行する作業は、複数フェーズからなる操作です。このスキルは、ユーザーの要求内容に応じて適切な次のスキルを選択します。

次のような場合に使用

ユーザーが「migrate Databricks」「port from Databricks to AIDP」「Unity Catalog migration」「DBX → AIDP」「lift-and-shift Databricks job」などに言及している場合
ユーザーがmigratorツールキットの「どこから始めるか」を尋ねている場合
ユーザーがmigratorアーキテクチャの概要を求めている場合

2パスアーキテクチャ（概念モデル）

┌────────────────────┐   ┌──────────────────────┐   ┌─────────────────────┐
│ Pass-0: 計画       │ → │ Pass-1: 依存コード   │ → │ Pass-2: 実行        │
│  build_dag.py      │   │  ensure_migrated()   │   │  job_migrate.py     │
│  check_data_       │   │  %run ツリーを走査し │   │  ライブAIDPクラスター│
│  availability.py   │   │  各依存ノートブックの│   │  上でセル単位に実行、│
│                    │   │  Databricks APIを    │   │  4方向検証、最大10回 │
│  （読み取り専用）  │   │  書き換え（実行なし）│   │  の修正試行          │
└────────────────────┘   └──────────────────────┘   └─────────────────────┘

カタログ移行は独立したフローであり、Pass-2の前に実行します
（移行済みノートブックが読み取りを試みる時点で、スキーマとテーブルの
ロケーションが存在するようにするため）:

┌──────────────────────────────────┐   ┌──────────────────────────────────┐
│ extract_catalog_databricks.py    │ → │ migrate_catalog.py               │
│  Unity Catalog APIへREST呼び出し │   │  18種のDDL書き換えルールを適用し │
│  → reports/catalog_pack.json     │   │  AIDPへCREATE SCHEMA / CREATE    │
│                                  │   │  TABLEを一括WSエグゼキュートで発行│
└──────────────────────────────────┘   └──────────────────────────────────┘

ユーザーの要求に対応するスキルの選択

ユーザーの発言	呼び出すスキル
「このDatabricksジョブを移行して」「このワークフローを移植して」「AIDPに変換して」	`aidp-migrate-job`
「移行マニフェストを作って」「何が移行対象か確認したい」「DAGを見せて」	`aidp-build-dag`
「ソーステーブルは利用可能か」「移行前チェック」「データの準備はできているか」	`aidp-check-data`
「移行を再開して」「移行済みをスキップして」「前回の続きから再開して」	`aidp-resume-migration`
「セルNが失敗している」「このノートブックを修正して」「セルKからリトライして」	`aidp-fixup-cell`
「Unity Catalogを移行して」「HMSスキーマを移植して」「DDL移行」	`aidp-migrate-catalog`
「S3バケットをOCIにマッピングして」「バケットマッピングを設定して」	`aidp-bucket-mapping`
「ストリーミング収束」「受け入れ契約」「パイプラインが安定するまで待って」	`aidp-acceptance-contract`
ツールキット初回利用、「何をインストールすればいいか」	`aidp-migrator-bootstrap`

事前に必要なセットアップ

このプラグインは自己完結型です — migratorエンジン一式が ${CLAUDE_PLUGIN_ROOT}/engine/ 配下にバンドルされています。このプラグイン内のいずれかのスキルが実際に動作するには、ユーザーが以下を準備する必要があります:

エンジンのPython依存パッケージをインストール済みであること。 pip install -r ${CLAUDE_PLUGIN_ROOT}/engine/requirements.txt を一度実行します。スキル aidp-migrator-bootstrap がこの手順と以降のチェックをガイドします。
~/.oci/config に api_key プロファイル（無人実行用）またはセッショントークンプロファイル（対話的実行用）が設定済みであること。
AIDPクラスターが起動中であること。 エンジンのPass-2はWebSocketによる実行パスを必要とするため、ライブクラスターが必須です。クラスターが停止している場合は、aidp-migrate-job を呼び出す前にAIDPコンソールからクラスターを起動するようユーザーに案内してください。
ANTHROPIC_API_KEY が環境変数に設定済みであること。エンジンはセルの書き換えごとにtool useを使用してClaudeを呼び出します。このキーがない場合、Pass-2ループは動作しません。
env-coords.md ファイルが存在すること。 references/env-coords.template.md を参照してください。 DataLake OCID、ワークスペースUUID、クラスターID、AIDPベースURL、 OCIプロファイル名をユーザーが一度記入すれば、以降のすべてのスキルがこれらの値を引き継ぎます。

このプラグインが対象外とすること

Databricks の dbutils.fs を OCI Object Storage のファイルとして移行することは対象外です — テーブル・ノートブック・ジョブ構成のみが対象です。 DBFSのファイルレベルのレプリケーションは別途対応が必要です。
Databricks Workflows-on-Pipelines（DLT）は対象外です — ジョブとタスクのみが対象です。 DLTパイプラインは、Spark Structured StreamingとスケジューリングとしてManualで再作成が必要です。
DatabricksのMLフィーチャーストア登録やMLflowモデルバージョンの自動移行は対象外です。これらは別途対応が必要です。
AIDPクラスター・ワークスペース・DataLakeのプロビジョニングは行いません。これらはすでに存在することを前提とします。

主要なリファレンス

references/cli-map.md — migratorのすべてのCLIエントリーポイントと目的の対応表
references/gotchas.md — Databricks → AIDPの注意点15項目と修正レシピ
references/ddl-rewrite-rules.md — 18種のDDL書き換えルール
references/env-coords.template.md — すべてのスキルが参照するスキャフォールド

初回移行の推奨実行順序

aidp-migrator-bootstrap — ワークステーションごとに1回
aidp-migrate-catalog — スキーマとテーブルを最初に移行し、ノートブックの読み取り先を確保する
aidp-bucket-mapping — s3:// 外部ロケーションを持つテーブルを移行する場合のみ
aidp-build-dag — reports/<job>_manifest.json を生成する
aidp-check-data — クラスター時間を消費する前に検証する
aidp-migrate-job — メインの移行実行
aidp-fixup-cell — 自動修正ループで回復できなかったセルがある場合のみ
aidp-acceptance-contract — ストリーミング/バッチ収束パイプラインの場合

原文（English）を表示

`aidp-migrator-overview` — router

Migrating a Databricks workload onto AIDP is a multi-phase operation. This skill picks the right next skill based on what the user asks.

When to use

The user mentions "migrate Databricks", "port from Databricks to AIDP", "Unity Catalog migration", "DBX → AIDP", "lift-and-shift Databricks job", or similar.
The user asks "where do I start" with the migrator toolkit.
The user asks for an overview of the migrator architecture.

The two-pass architecture (mental model)

┌────────────────────┐   ┌──────────────────────┐   ┌─────────────────────┐
│ Pass-0: Plan       │ → │ Pass-1: Dep code     │ → │ Pass-2: Execute     │
│  build_dag.py      │   │  ensure_migrated()   │   │  job_migrate.py     │
│  check_data_       │   │  walks %run tree,    │   │  cell-by-cell on a  │
│  availability.py   │   │  rewrites Databricks │   │  live AIDP cluster, │
│                    │   │  APIs in each dep    │   │  4-way verify, up   │
│  (read-only)       │   │  notebook (no run)   │   │  to 10 fix attempts │
└────────────────────┘   └──────────────────────┘   └─────────────────────┘

The catalog migration is a SEPARATE flow, run BEFORE Pass-2 (so the
schemas + table locations exist when migrated notebooks try to read them):

┌──────────────────────────────────┐   ┌──────────────────────────────────┐
│ extract_catalog_databricks.py    │ → │ migrate_catalog.py               │
│  REST against Unity Catalog API  │   │  18 DDL rewrite rules → batched  │
│  → reports/catalog_pack.json     │   │  CREATE SCHEMA / CREATE TABLE    │
│                                  │   │  on AIDP in a single WS execute  │
└──────────────────────────────────┘   └──────────────────────────────────┘

Pick the right skill for the user's ask

User says	Skill to invoke
"Migrate this Databricks job", "port this workflow", "convert to AIDP"	`aidp-migrate-job`
"Build a migration manifest", "what would migrate", "show the DAG"	`aidp-build-dag`
"Are my source tables available?", "pre-migration check", "is the data ready"	`aidp-check-data`
"Resume the migration", "skip already-migrated", "pick up where I left off"	`aidp-resume-migration`
"Cell N is failing", "fix this notebook", "retry from cell K"	`aidp-fixup-cell`
"Migrate the Unity Catalog", "port the HMS schemas", "DDL migration"	`aidp-migrate-catalog`
"Map s3 buckets to OCI", "configure bucket mapping"	`aidp-bucket-mapping`
"Streaming convergence", "acceptance contract", "wait for pipeline to settle"	`aidp-acceptance-contract`
First time using this toolkit, "what do I need to install"	`aidp-migrator-bootstrap`

What the user must have set up before any of this

This plugin is self-contained — the full migrator engine ships bundled under ${CLAUDE_PLUGIN_ROOT}/engine/. Before any skill in this plugin can do real work, the user needs:

Engine Python deps installed. One-time pip install -r ${CLAUDE_PLUGIN_ROOT}/engine/requirements.txt. Skill aidp-migrator-bootstrap walks through this and the rest of these checks.
~/.oci/config with either an api_key profile (unattended) or session-token profile (interactive).
An ACTIVE AIDP cluster. The engine's Pass-2 requires a live cluster — the WebSocket execute path. If the cluster is stopped, ask the user to start it via AIDP console before invoking aidp-migrate-job.
ANTHROPIC_API_KEY in the environment. The engine uses Claude with tool use for every cell rewrite. Without this key the Pass-2 loop won't run.
An env-coords.md file — see references/env-coords.template.md. The customer fills in their DataLake OCID, workspace UUID, cluster ID, AIDP base URL, OCI profile name once; every other skill threads these through.

What this plugin does NOT do

It does not migrate Databricks dbutils.fs to OCI Object Storage files — only table/notebook/job constructs. File-level DBFS replication is a separate exercise.
It does not handle Databricks Workflows-on-Pipelines (DLT) — only Jobs + tasks. DLT pipelines need manual recreation as Spark structured streaming + scheduling.
It does not migrate Databricks ML feature-store registrations or MLflow model versions automatically. Those need separate handling.
It does not provision the AIDP cluster, workspace, or DataLake. Assume those exist.

Key references

references/cli-map.md — every migrator CLI entrypoint mapped to its purpose.
references/gotchas.md — 15 Databricks → AIDP gotchas with fix recipes.
references/ddl-rewrite-rules.md — the 18 DDL rewrite rules.
references/env-coords.template.md — the scaffold every skill threads from.

Order of operations for a fresh migration

aidp-migrator-bootstrap — once per workstation.
aidp-migrate-catalog — schemas and tables FIRST, so notebook reads have targets.
aidp-bucket-mapping — only if migrating tables with s3:// external locations.
aidp-build-dag — produces reports/<job>_manifest.json.
aidp-check-data — verify before committing cluster time.
aidp-migrate-job — the big run.
aidp-fixup-cell (only if needed) for cells the auto-fix loop couldn't recover.
aidp-acceptance-contract (for streaming / batch convergence pipelines).

原文・著作権は Anthropic および各プラグイン作者に帰属します。日本語訳は Claude API による自動翻訳です。