スキルOfficialdevelopment

📚aidp-catalog-init

プラグイン: oracle-ai-data-platform-workbench-engineer-agent
ソース: GitHub で見る ↗

説明

一度だけ実行するAIDPカタログ探索機能です。テーブル、カラム、FK/結合ヒント、値ディクショナリといった情報を含む、キャッシュ済みかつバージョン管理可能な `.aidp/catalog.md` グラウンディングファイルを生成・書き込みます。次のような場合に使用: - ユーザーが `/aidp-catalog-init` と入力したとき - 「レイクハウスをマッピング／探索したい」と依頼されたとき - データ探索のセットアップを求められたとき - カタログキャッシュが存在しない状態でデータに関する質問に回答する前スキーマに変更があった場合は、`--refresh` オプションを付けて再実行してください。

原文を表示

One-time AIDP catalog discovery that writes a cached, version-controllable .aidp/catalog.md grounding file (tables, columns, FK/join hints, value dictionaries). Use when the user says "/aidp-catalog-init", asks to "map/discover my lakehouse", set up data discovery, or before answering data questions when no catalog cache exists. Re-run with --refresh when the schema changes.

ユースケース

✓ユーザーが `/aidp-catalog-init` と入力したとき
✓レイクハウスをマッピング／探索したいとき
✓データ探索のセットアップを求められたとき
✓カタログキャッシュがない状態でデータに関する質問に答えるとき

本文（日本語訳）

`aidp-catalog-init` — カタロググラウンディングファイルの生成

AIDPカタログツリーを走査し、.aidp/catalog.md を生成します。これはキャッシュされたユーザー編集可能なグラウンディングファイルであり、後続のNL→SQL変換を高速かつ高精度にするためのものです。

ディスカバリー処理は 純粋なコントロールプレーン — SQLなし・コンピュートなし （オプションの --with-counts を除く。これはバンドル済みSQLヘルパーを使用）。 aidp MCPは不要な自己完結型スキルです。

次のような場合に使用

初回セットアップ時、またはスキーマ変更後に --refresh を実行する場合。

エンジン — 公式 `aidp` CLI（コントロールプレーン、コンピュートなし）

推奨エンジンはOracle公式の aidp CLI です。CLIがインストールされていない場合は oci raw-request がフォールバックとして使用されます。両者とも同一の認証情報で同一のデータプレーンREST APIにアクセスします。スキル→コマンドの対応表は references/aidp-cli-map.md、ベースURL・認証ラダー・規約については references/oci-raw-request.md を参照してください。

CLI（推奨）:

# 1. カタログ一覧
aidp catalog list --instance-id <DATALAKE_OCID> --auth api_key --profile DEFAULT --region <r>
# 2. カタログ内のスキーマ一覧
aidp schema list --catalog-key <cat> --instance-id <DATALAKE_OCID> --auth api_key --profile DEFAULT --region <r>
# 3. スキーマ内のテーブル一覧（schema-keyは <cat.schema> 形式のドット区切り）
aidp schema list-tables --catalog-key <cat> --schema-key <cat.schema> --instance-id <DATALAKE_OCID> --auth api_key --profile DEFAULT --region <r>
# 単一リソース取得: aidp catalog get · aidp schema get · aidp schema get-table

フォールバック（CLIなし） — oci raw-request （LIVE-VERIFIED 20240831 / dataLakes / --profile DEFAULT — references/no-mcp-rest-map.md 参照）:

B="https://aidp.<region>.oci.oraclecloud.com/20240831/dataLakes/<DATALAKE_OCID>"
oci raw-request --http-method GET --target-uri "$B/catalogs" --profile DEFAULT
oci raw-request --http-method GET --target-uri "$B/schemas?catalogKey=<cat>" --profile DEFAULT
oci raw-request --http-method GET --target-uri "$B/tables?catalogKey=<cat>&schemaKey=<cat.schema>" --profile DEFAULT

単一テーブル / カラム情報 — aidp schema get-table（またはREST tables?… リスト。カラム・型・プロパティを返す）を使用し、キーでクライアント側フィルタリングしてください（単一テーブル専用パラメータは未確認 — no-mcp-rest-map.md 参照）。
各エンドポイントのパラメータは必須です。パスのみのリクエストは 400 InvalidParameter: query param X must not be null（不足パラメータ名付き）を返します。
401/403 / "Security Token" エラーが発生した場合は、oci-raw-request.md の認証ラダー（AIDP_SESSION をリフレッシュし、--auth security_token でリトライ）に従ってください。

処理手順

ツリーの走査（コンピュートなし）: aidp catalog list → 各カタログに対して aidp schema list --catalog-key → 各スキーマに対して aidp schema list-tables --catalog-key --schema-key（カラム・型・プロパティ取得） — またはRESTフォールバックを使用。大規模カタログの場合は、カタログ1件につきサブエージェント1つを起動して並列ディスカバリーを行ってください。
グラウンディングヒントの収集（NL→SQL精度向上の核心部分）:
- FK / JOINヒント — 命名規則（*_sk、*_id、カラム名の一致）やテーブルプロパティの宣言済みキーから結合キーを推定し、agentが後でJOINを誤推測しないよう記録します。
- 値ディクショナリ — 低カーディナリティのカテゴリカラムについて、正規値・フォーマットをメモします（"California" vs "CA" などWHERE句の誤記防止）。コストが低い場合（--with-counts パス）のみdistinct値を取得するか、TODOとしてマークします。
- 大規模テーブルフラグ — 大きなファクトテーブルに「常に日付でフィルタすること」などのフラグを付けます。
コードベースからの補完（存在する場合）: 既存のノートブック・SQLファイル・CLAUDE.md から説明文を取り込みます。
.aidp/catalog.md の書き込み: 以下のセクションで構成します: Quick Reference（概念→テーブル対応）、 Catalogs → schemas → tables（カラム・型・JOINキー・フラグ）、 Value dictionaries、 Gotchas。 --refresh 時はユーザー編集とHTMLコメントを保持し、削除されたテーブルには  を付記します。
ユーザーへのサマリー表示: （N catalogs / schemas / tables、大規模テーブルのフラグ情報）次のステップとして aidp-semantic-model（メトリクス定義）や aidp-analyzing-data（データへの質問）を提案します。

オプション

--refresh — ユーザー編集・Quick Referenceの行を保持しながら再生成します。
--catalog <name> — 対象を1カタログに限定します。
--with-counts — バンドル済みSQLヘルパー経由で行数・distinct値も取得します（クラスターを使用するため、デフォルトOFF — コンピュートコストが発生し、稼働中のクラスターが必要）:
```
python "$PLUGIN_DIR/scripts/aidp_sql.py" --region <r> --datalake <DATALAKE_OCID> --workspace <ws> --cluster <key> \
  --code "spark.sql('SELECT COUNT(*) AS n FROM <cat>.<schema>.<table>').show()"
```
status / outputs / spark_job_ids を含むJSONを返します。 api_key DEFAULTプロファイルからUPSTを生成し、スクラッチノートブックを自動作成します（AIDP_SESSION不要）。コントロールプレーン側については references/oci-raw-request.md を参照してください。

出力フォーマット（`.aidp/catalog.md`）

# AIDP catalog — generated <date> (edit freely)
## Quick Reference
| Concept | Table | Key |
|---|---|---|
| customers | default.default.customer | c_customer_sk |
## <catalog> → <schema>
#### <table>   (rows: <n if --with-counts>; LARGE if big)
| Column | Type | Notes (PK/FK/join) |
## Value dictionaries
## Gotchas

補足事項

<region> / <DATALAKE_OCID> / <workspace> は明示的に解決してください。カタログ呼び出しはDataLakeスコープ、SQLヘルパーはworkspace + clusterスコープです。
.aidp/ はgit-ignoreされています — プロジェクト単位のキャッシュであり、pluginには同梱されません。
Auto-Populate Catalog Extractor（Object Storageからの一括自動カタログ化）はREST APIを持ちます。 エンドポイントは …/dataLakes/<OCID>/extractors（/metadataExtractors ではなく — 旧メモが誤ったパスを参照していたため404になる）。 LIVE-VERIFIED 2026-06-12: GET …/20240831/dataLakes/<OCID>/extractors → 200 {"items":[]}. 公開サーフェス: GET/POST/DELETE /extractors、 GET /extractors/<key>/extractedEntities、 GET /extractors/<key>/extractedTables/<name>、 POST /extractors/<key>/actions/manageExtractedEntities（accept/reject/import）、ライフサイクル: ACCEPTED→IN_PROGRESS→SUCCEEDED/FAILED/IN_REVIEW。これは上記のディスカバリーウォークと aidp-ingest-file-to-table を補完するもので、置き換えるものではありません。 create/manage の書き込みパスは（Object Storageソースが必要なため）実際に使用する前にライブ検証してください。
aidp MCPは オプションのアクセラレータ です。設定済みの場合は生コール代わりに list_catalogs / list_schemas / list_tables / get_table を使用できますが、必須ではありません。

参考資料

references/aidp-cli-map.md — スキル→公式 aidp CLIコマンド対応表（プライマリエンジン）
references/oci-raw-request.md · references/no-mcp-rest-map.md · references/semantic-model.md

原文（English）を表示

`aidp-catalog-init` — build the catalog grounding file

Walk the AIDP catalog tree and generate .aidp/catalog.md — the cached, user-editable grounding file that makes subsequent NL-to-SQL fast and accurate. Discovery is pure control-plane — no SQL, no compute (except optional --with-counts, which uses the bundled SQL helper). Self-contained: no aidp MCP required.

When to use

First-time setup, or --refresh after schema changes.

Engine — official `aidp` CLI (control-plane, no compute)

Preferred engine is the official Oracle aidp CLI; oci raw-request is the fallback when the CLI isn't installed. Both hit the same data-plane REST API with the same auth — see references/aidp-cli-map.md for the full skill→command map and references/oci-raw-request.md for base URL + auth ladder + conventions.

CLI (preferred):

# 1. catalogs
aidp catalog list --instance-id <DATALAKE_OCID> --auth api_key --profile DEFAULT --region <r>
# 2. schemas in a catalog
aidp schema list --catalog-key <cat> --instance-id <DATALAKE_OCID> --auth api_key --profile DEFAULT --region <r>
# 3. tables in a schema (schema-key is the dotted <cat.schema>)
aidp schema list-tables --catalog-key <cat> --schema-key <cat.schema> --instance-id <DATALAKE_OCID> --auth api_key --profile DEFAULT --region <r>
# single catalog/schema/table: aidp catalog get · aidp schema get · aidp schema get-table

Fallback (no CLI installed) — oci raw-request (LIVE-VERIFIED 20240831 / dataLakes / --profile DEFAULT — see references/no-mcp-rest-map.md):

B="https://aidp.<region>.oci.oraclecloud.com/20240831/dataLakes/<DATALAKE_OCID>"
oci raw-request --http-method GET --target-uri "$B/catalogs" --profile DEFAULT
oci raw-request --http-method GET --target-uri "$B/schemas?catalogKey=<cat>" --profile DEFAULT
oci raw-request --http-method GET --target-uri "$B/tables?catalogKey=<cat>&schemaKey=<cat.schema>" --profile DEFAULT

Single table / columns — aidp schema get-table (or the REST tables?… list, which returns columns, types, and properties); filter to the one table client-side by its key (no dedicated single-table param confirmed — see no-mcp-rest-map.md).
Per-endpoint params are required: a bare path returns 400 InvalidParameter: query param X must not be null, which names the missing param.
On 401/403/"Security Token", follow the auth ladder (refresh AIDP_SESSION, retry with --auth security_token) in oci-raw-request.md.

Process

Walk the tree (no compute): aidp catalog list → for each, aidp schema list --catalog-key → for each, aidp schema list-tables --catalog-key --schema-key (columns, types, properties) — or the REST fallback above. For large catalogs, fan out one subagent per catalog to parallelize discovery.
Capture grounding hints (this is what raises NL-SQL accuracy):
- FK/join hints — infer likely join keys from naming (*_sk, *_id, shared column names) and any declared keys in the table properties. Record them so the agent doesn't guess joins later.
- Value dictionaries — for low-cardinality categorical columns, note canonical values/format (prevents wrong WHERE literals like "California" vs "CA"). Pull distinct values only when cheap (--with-counts path), or mark TODO.
- Large-table flags — flag big fact tables ("always filter by date").
Enrich from the codebase if present (existing notebooks, SQL files, CLAUDE.md) for descriptions.
Write .aidp/catalog.md with sections: Quick Reference (concept→table), Catalogs → schemas → tables (columns, types, join keys, flags), Value dictionaries, Gotchas. Preserve user edits + HTML comments on --refresh; flag removed tables with .
Summarize to the user (N catalogs / schemas / tables, large tables flagged) and suggest next steps (aidp-semantic-model for metrics, aidp-analyzing-data to ask questions).

Options

--refresh — regenerate, preserving user edits and Quick-Reference rows.
--catalog <name> — limit to one catalog.
--with-counts — also fetch row counts / distinct values via the bundled SQL helper (uses the cluster, off by default — it costs compute and needs a running cluster):
```
python "$PLUGIN_DIR/scripts/aidp_sql.py" --region <r> --datalake <DATALAKE_OCID> --workspace <ws> --cluster <key> \
  --code "spark.sql('SELECT COUNT(*) AS n FROM <cat>.<schema>.<table>').show()"
```
Returns JSON with status / outputs / spark_job_ids; mints a UPST from the api_key DEFAULT profile and auto-creates a scratch notebook (no AIDP_SESSION required). See references/oci-raw-request.md for the control-plane side.

Output format (`.aidp/catalog.md`)

# AIDP catalog — generated <date> (edit freely)
## Quick Reference
| Concept | Table | Key |
|---|---|---|
| customers | default.default.customer | c_customer_sk |
## <catalog> → <schema>
#### <table>   (rows: <n if --with-counts>; LARGE if big)
| Column | Type | Notes (PK/FK/join) |
## Value dictionaries
## Gotchas

Notes

Resolve <region> / <DATALAKE_OCID> / <workspace> explicitly — catalog calls are scoped to the DataLake; the SQL helper is scoped to a workspace + cluster.
.aidp/ is git-ignored — it's a per-project cache, not shipped with the plugin.
Auto-Populate Catalog Extractor (bulk auto-cataloging from Object Storage) has a REST surface at …/dataLakes/<OCID>/extractors (NOT /metadataExtractors, which 404s — an earlier note probed the wrong path). LIVE-VERIFIED 2026-06-12: GET …/20240831/dataLakes/<OCID>/extractors → 200 {"items":[]}. Surface: GET/POST/DELETE /extractors, GET /extractors/<key>/extractedEntities, GET /extractors/<key>/extractedTables/<name>, POST /extractors/<key>/actions/manageExtractedEntities (accept/reject/import), lifecycle ACCEPTED→IN_PROGRESS→SUCCEEDED/FAILED/IN_REVIEW. This complements (does not replace) the discovery walk above and aidp-ingest-file-to-table. Probe the create/manage write paths live (need an Object Storage source) before relying on them.
The aidp MCP is an optional accelerator — if one is configured you may use list_catalogs / list_schemas / list_tables / get_table instead of the raw calls, but it is not required.

References

references/aidp-cli-map.md — skill → official aidp CLI command map (primary engine)
references/oci-raw-request.md · references/no-mcp-rest-map.md · references/semantic-model.md

原文・著作権は Anthropic および各プラグイン作者に帰属します。日本語訳は Claude API による自動翻訳です。