スキルOfficialdevelopment

📋aidp-build-dag

プラグイン: oracle-ai-data-platform-workbench-databricks-migrator
ソース: GitHub で見る ↗

説明

DatabricksワークスペースのパスまたはワークフローIDから、移行マニフェスト（実行DAG）を構築します。 `%run` の依存チェーンをたどり、`dbutils.notebook.run` の呼び出しをキャプチャして、`reports/<job>_manifest.json` を出力します。このファイルは、他のすべての execute-skill が入力として消費します。次のような場合に使用: - 何が移行対象になるかを事前に確認したいとき - 新しいワークロードに対して初めて `aidp-migrate-job` を実行する前

原文を表示

Build a migration manifest (the execution DAG) from a Databricks workspace path or workflow ID. Walks %run dependency chains, captures dbutils.notebook.run invocations, and emits reports/<job>_manifest.json — the input every other execute-skill consumes. Use when the user wants to see what would migrate, or before invoking aidp-migrate-job for the first time on a new workload.

ユースケース

✓移行対象の内容を事前確認する
✓新しいワークロード移行の初回実行前

本文（日本語訳）

`aidp-build-dag` — マイグレーションマニフェストのビルド

実行DAGは、マイグレーターがどのノートブックを、どの順序で、どの依存関係に従って移行するかを把握するために読み込むものです。ワークロードごとに一度ビルドしてください。

次のような場合に使用

ユーザーが「何が移行されるか」「依存ツリーを表示して」「マニフェストをビルドして」などと尋ねている場合。
新しいワークロードに対して aidp-migrate-job を実行する前。
DatabricksワークスペースのパスまたはワークフローIDを変更した後。

2つのエントリーポイント

マイグレーターには2種類のDAGビルダーが付属しています。入力の形式に応じて選択してください。

入力	エントリーポイント
パスベース: 移行対象の `.ipynb` / `.py` ノートブックが含まれるフォルダー	`${CLAUDE_PLUGIN_ROOT}/engine/scripts/build_dag.py`
ワークフローベース: 移行対象のタスクを持つDatabricksジョブID（タスクDAGを保持して移行）	`${CLAUDE_PLUGIN_ROOT}/engine/scripts/build_dag_from_workflow.py`

パスベースの実行

python3 ${CLAUDE_PLUGIN_ROOT}/engine/scripts/build_dag.py \
  --root "<databricks-workspace-path>" \
  --job-name "<MyJob>" \
  --output reports/<MyJob>_manifest.json

--root — エントリーノートブックが格納されているDatabricksワークスペースのフォルダー。スクリプトはこのプレフィックス配下のすべての *.ipynb / *.py を再帰的に走査します。
--job-name — マニフェストの名前（出力ベースのサブディレクトリ名として使用されます）。
--output — マニフェストの出力先パス。

ビルダーは %run チェーンと dbutils.notebook.run(...) 呼び出しをたどり、トポロジカル順に並べたDAGを構築します。また、推移的な依存関係にフラグを立て、Pass-1がコードのみを先に移行すべきノートブックを識別できるようにします。

ワークフローベースの実行

python3 ${CLAUDE_PLUGIN_ROOT}/engine/scripts/build_dag_from_workflow.py \
  --job-id <databricks-job-id> \
  --output reports/<MyJob>_manifest.json

Databricks Jobs REST APIを通じてジョブのタスク定義を取得し、depends_on タスクエッジをマニフェストのDAGに変換します。 %run から依存関係を推論するのではなく、AIDPの移行をDatabricksワークフローの構造に合わせたい場合に使用してください。

必要な環境変数 / 引数:

DATABRICKS_HOST — https://<workspace>.cloud.databricks.com
DATABRICKS_TOKEN — ワークスペース読み取り権限を持つPAT（個人アクセストークン）

マニフェストの構造

出力はJSONで、トップレベルの構造は以下のとおりです。

{
  "job_name": "<MyJob>",
  "tasks": [
    {
      "task_key": "extract",
      "notebook_path": "Users/.../extract.ipynb",
      "depends_on": []
    },
    {
      "task_key": "transform",
      "notebook_path": "Users/.../transform.ipynb",
      "depends_on": ["extract"]
    }
  ],
  "deps": [
    {
      "notebook_path": "Users/.../helpers/io_utils.ipynb",
      "referenced_by": ["extract", "transform"]
    }
  ]
}

tasks は名前付きのエントリーポイント、deps は %run / notebook.run によって推移的に検出されたターゲットです。 Pass-1 は deps を先に移行（コードのみ）し、Pass-2 は tasks をトポロジカル順に実行します。

実行前にマニフェストをサニティチェックする

ビルダーが完了したら、以下の3つの確認を行ってください。いずれも小規模・高速で、主要な設定ミスの大半を検出できます。

# 1. 件数確認
jq '.tasks | length, .deps | length' reports/<MyJob>_manifest.json

# 2. トポロジカル順の正確性確認 — 循環なし・依存先が利用元より前にあること
jq '.tasks[] | select(.depends_on | length > 0) | {task: .task_key, deps: .depends_on}' reports/<MyJob>_manifest.json

# 3. ノートブックパスの解決確認（すべてのパスがDatabricksワークスペースから到達可能であること）
jq -r '.tasks[].notebook_path' reports/<MyJob>_manifest.json

notebook_path がDatabricks上に存在しない場合、ビルダーは警告をログに記録しますが、処理を中断しません。ここで必ず確認してください。

既知の注意事項

動的パスを使用した dbutils.notebook.run。 ソースノートブックがターゲットパスを実行時に構築している場合（例: dbutils.notebook.run(some_var, ...)）、ビルダーはそのパスを解決できません。その依存関係はマニフェストに現れず、移行後に実行時エラーが発生します。ユーザーには (a) パスをリテラルにするか、(b) ターゲットを dep_hints セクションに手動で追加するよう伝えてください。
for_each_task を含むワークフロータスク。 モデル化されていません。通常の notebook_task に変換してから使用してください。
サブワークフロー（ワークフローからワークフローを呼び出す構成）。 DAGビルダーは外側のワークフローのみをたどります。ネストされたワークフローはそれぞれ別のマニフェストが必要です。

次のステップ

マニフェストの内容が正しければ:

aidp-check-data を実行して、ソーステーブルがクラスター上に存在することを確認する。
aidp-migrate-job を --manifest reports/<MyJob>_manifest.json オプション付きで実行する。

原文（English）を表示

`aidp-build-dag` — build the migration manifest

The execution DAG is what the migrator reads to know which notebooks to migrate, in what order, with what dependencies. Build it once per workload.

When to use

User asks "what would migrate", "show me the dependency tree", "build the manifest".
Before any aidp-migrate-job invocation against a new workload.
After changing the Databricks workspace path or workflow ID.

Two entry points

The migrator ships two DAG builders. Pick based on input shape:

Input	Entrypoint
Path-based: a folder of `.ipynb` / `.py` notebooks the user wants migrated	`${CLAUDE_PLUGIN_ROOT}/engine/scripts/build_dag.py`
Workflow-based: a Databricks Job ID whose tasks the user wants migrated, preserving the task DAG	`${CLAUDE_PLUGIN_ROOT}/engine/scripts/build_dag_from_workflow.py`

Path-based invocation

python3 ${CLAUDE_PLUGIN_ROOT}/engine/scripts/build_dag.py \
  --root "<databricks-workspace-path>" \
  --job-name "<MyJob>" \
  --output reports/<MyJob>_manifest.json

--root — Databricks workspace folder containing the entry notebooks. The script walks every *.ipynb / *.py under this prefix.
--job-name — a name for the manifest (used as the output-base subdirectory).
--output — manifest write path.

The builder follows %run chains AND dbutils.notebook.run(...) calls to build a topo-ordered DAG. It also flags transitive deps so Pass-1 knows which notebooks to migrate code-only first.

Workflow-based invocation

python3 ${CLAUDE_PLUGIN_ROOT}/engine/scripts/build_dag_from_workflow.py \
  --job-id <databricks-job-id> \
  --output reports/<MyJob>_manifest.json

This pulls the Job's task definitions via the Databricks Jobs REST API and converts depends_on task edges into the manifest's DAG. Use when the user wants the AIDP migration to mirror the Databricks Workflow shape (vs. just inferring dependencies from %run).

Required env / args:

DATABRICKS_HOST — https://<workspace>.cloud.databricks.com
DATABRICKS_TOKEN — a PAT with workspace-read permission

Manifest shape

The output is JSON with this top-level structure:

{
  "job_name": "<MyJob>",
  "tasks": [
    {
      "task_key": "extract",
      "notebook_path": "Users/.../extract.ipynb",
      "depends_on": []
    },
    {
      "task_key": "transform",
      "notebook_path": "Users/.../transform.ipynb",
      "depends_on": ["extract"]
    }
  ],
  "deps": [
    {
      "notebook_path": "Users/.../helpers/io_utils.ipynb",
      "referenced_by": ["extract", "transform"]
    }
  ]
}

tasks are the named entry points; deps are %run / notebook.run targets discovered transitively. Pass-1 migrates the deps first (code-only), then Pass-2 executes the tasks in topo order.

Sanity-check the manifest before running

After the builder finishes, do these three reads — small, fast, catch most config issues:

# 1. count
jq '.tasks | length, .deps | length' reports/<MyJob>_manifest.json

# 2. topo correctness — no cycles, deps come before users
jq '.tasks[] | select(.depends_on | length > 0) | {task: .task_key, deps: .depends_on}' reports/<MyJob>_manifest.json

# 3. notebook paths resolve (every path is reachable from the Databricks workspace)
jq -r '.tasks[].notebook_path' reports/<MyJob>_manifest.json

If any notebook_path doesn't exist in Databricks, the builder logs a warning but does NOT fail. Catch it here.

Known caveats

dbutils.notebook.run with dynamic paths. If the source notebook builds the target path at runtime (dbutils.notebook.run(some_var, ...)), the builder cannot resolve it. The dep won't appear in the manifest and the runtime call will fail post-migration. Tell the user to either (a) make the path literal, or (b) add the target to a dep_hints section manually.
Workflow tasks with for_each_task. Not modeled. Convert to a regular notebook_task first.
Sub-workflows (Workflow-runs-Workflow). The DAG builder follows the OUTER workflow only. Nested workflows need a separate manifest each.

After this

Once the manifest looks right:

Run aidp-check-data to verify source tables exist on the cluster.
Run aidp-migrate-job with --manifest reports/<MyJob>_manifest.json.

原文・著作権は Anthropic および各プラグイン作者に帰属します。日本語訳は Claude API による自動翻訳です。