スキルOfficialdevelopment

🔗aidp-federate

プラグイン: oracle-ai-data-platform-workbench-engineer-agent
ソース: GitHub で見る ↗

説明

1つのAIDP Sparkセッション内で複数のデータソースにわたるフェデレーションを実行します。複数のコネクター（Oracle ADB/ExaCS、Fusion、Snowflake、S3、レイクハウステーブルなど）からデータを読み込み、単一のノートブック上でそれらを結合することができます。次のような場合に使用: ユーザーが複数のソースからデータを組み合わせ・結合したい場合、外部システムとレイクハウスをブレンドしたい場合、またはクロスソース分析を行いたい場合。 spark-connectors pluginを組み合わせて動作します。コネクターの定義は重複して持ちません。

原文を表示

Federate across multiple data sources in one AIDP Spark session — read from several connectors (Oracle ADB/ExaCS, Fusion, Snowflake, S3, lakehouse tables, …) and join them in a single notebook. Use when the user wants to combine/join data from more than one source, blend an external system with the lakehouse, or do cross-source analysis. Composes the spark-connectors plugin; does not duplicate connectors.

ユースケース

✓複数のデータソースを結合したい
✓外部システムとレイクハウスをブレンドしたい
✓クロスソース分析を行いたい

本文（日本語訳）

`aidp-federate` — 1つのSparkセッションによるクロスソースフェデレーション

複数のソースをそれぞれ1つのSparkセッションに読み込んでJoinすることで、データを統合します。これは競合Agentとの重要な差別化ポイントです（競合Agentは外部テーブルに対してパフォーマンスが低下するか、外部オーケストレーターと手動キーJoinが必要になります）。実行はバンドル済みの scripts/aidp_sql.py ヘルパー経由で行われます（MCP不要、AIDP_SESSION不要）。

次のような場合に使用

「<ソースA> と <ソースB> のデータをJoinしたい」「FusionとLakehouseを統合したい」「SnowflakeとStore Salesをブレンドしたい」など、クロスソース分析全般。
単一外部ソースへの接続エントリポイントとしても使用（「FusionやEPM、ADB等に接続するNotebookを作りたい」など）：接続レシピはspark-connectors Pluginの aidp-<source> Skill（ステップ1）から取得します。このSkillはJoinを追加するだけです。ソース接続を手書きで実装しないでください。 （claude plugin list でspark-connectors Pluginがインストール済みか確認してください。 oracle_ai_data_platform_connectors ヘルパーパッケージを /Workspace/Shared にプッシュするため、 aidp-connectors-bootstrap Skillを一度実行してください — AIDP MCP経由、またはMCPがインスタンスに到達できない場合は手動で。）

動作の仕組み（新規サーフェスではなく、既存機能の組み合わせ）

1. ソース読み込み → spark-connectors Plugin

外部ソースごとに、oracle-ai-data-platform-workbench-spark-connectors の対応するConnector Skillを使用して（例: aidp-alh / aidp-oracle-db / aidp-exacs、aidp-fusion-bicc / aidp-fusion-rest、 aidp-snowflake、aidp-aws-s3、aidp-object-storage など）、各ソースの spark.read.format(...).option(...).load() レシピを取得します。このSkillはConnectorを再実装しません。

2. 1つのセッションでJoin

各ソースをDataFrameに読み込み、それぞれをTemp Viewとして登録したうえで、 spark.sql(...) でJoinを実行するセルを1つ作成します。これらはすべて、ヘルパーが作成する同一のSparkセッションで実行されます。 Joinキーは .aidp/semantic.md / .aidp/catalog.md から取得してください（推測は厳禁）。ヘルパーはapi_keyのDEFAULTプロファイルからUPSTを生成し、対象クラスター上にスクラッチNotebookを自動作成します：

python "$PLUGIN_DIR/scripts/aidp_sql.py" \
  --region <region> --datalake <DATALAKE_OCID> --workspace <ws> --cluster <cluster-key> \
  --code "
df_lake = spark.table('default.default.customer')
df_ext  = spark.read.format('...').option('...', '...').load()   # Connector SkillからのレシピをここへA
df_lake.createOrReplaceTempView('lake')
df_ext.createOrReplaceTempView('ext')
spark.sql('''SELECT ... FROM lake l JOIN ext e ON l.key = e.key ...''').show(50, truncate=False)
"

戻り値はJSON形式 {status, execution_count, outputs, spark_job_ids, error} です。 SQL/SparkエラーはErrorフィールドから読み取り、カタログに基づいて修正してください — 何度も推測しないでください。まず --code "spark.sql('SELECT 1').show()" で接続の疎通確認を行ってください。

3. 結果の提示

統合した結果を提示し、必要に応じて aidp-ingest-file-to-table / manage-tables でテーブルとして永続化するか、 aidp-verified-queries でJoinを保存します。

誠実な説明（情報の捏造禁止）

このSkillは**「1つのSparkセッションでのフェデレーション」**（各ソースをSparkに読み込んだうえでJoin）として説明してください。これはヘルパーが文字通り行っていることです：すべてのソース読み込みとJoinが、 1回の aidp_sql.py 実行の単一Sparkセッション内で動作します。対象環境での実証なしに、異種ソースをまたいだシングルクエリプッシュダウンフェデレーション を主張しないでください — その機能は未確認の可能性があるため、誇張は禁物です。
すべての読み込みは1回の実行/セルにまとめてセッションを共有してください。 2回目の aidp_sql.py 呼び出しは新しいセッションとなり、以前のTemp Viewは消滅します。
データ量に注意してください：大容量の外部読み込みは、Connectorがサポートしている場合はソース側でフィルタリング/プッシュダウンを行い、そうでない場合はサンプリングしてください。実行前にクラスターがRUNNING状態であることを確認してください（aidp-cluster-ops 参照）。

参照

spark-connectors Plugin（ソース読み込み） · scripts/aidp_sql.py — バンドル済みSpark-SQLエグゼキューター
references/semantic-model.md · references/no-mcp-rest-map.md · aidp-analyzing-data と連携

原文（English）を表示

`aidp-federate` — cross-source federation in one Spark session

Blend multiple sources by reading each into one Spark session and joining them — a signature differentiator (competitor agents degrade on foreign tables or need an external orchestrator + manual key-joins). Execution is via the bundled scripts/aidp_sql.py helper (no MCP, no AIDP_SESSION required).

When to use

"Join data from <source A> and <source B>", "combine Fusion + the lakehouse", "blend Snowflake with store_sales", any cross-source analysis.
Also the entry point for a SINGLE external source ("a notebook that connects to Fusion / EPM / ADB / …"): the connection recipe still comes from the spark-connectors plugin's aidp-<source> skill (step 1) — this skill just adds the join. Never hand-roll the source connection. (Check the spark-connectors plugin is installed via claude plugin list; run its aidp-connectors-bootstrap skill once to push the oracle_ai_data_platform_connectors helper package to /Workspace/Shared — via the AIDP MCP, or manually if the MCP can't reach the instance.)

How it works (composition, not new surface)

Source reads → spark-connectors plugin. For each external source, use the matching connector skill from oracle-ai-data-platform-workbench-spark-connectors (e.g. aidp-alh/aidp-oracle-db/aidp-exacs, aidp-fusion-bicc/aidp-fusion-rest, aidp-snowflake, aidp-aws-s3, aidp-object-storage, …) to get the spark.read.format(...).option(...).load() recipe for each source. This skill does not re-implement connectors.
Join in one session. Run a single cell that reads each source into a DataFrame, registers each as a temp view, and spark.sql(...) the join — all in the same Spark session created by the helper. Use join keys from .aidp/semantic.md / .aidp/catalog.md (don't guess). The helper mints a UPST from the api_key DEFAULT profile and auto-creates a scratch notebook on the target cluster:
```
python "$PLUGIN_DIR/scripts/aidp_sql.py" \
  --region <region> --datalake <DATALAKE_OCID> --workspace <ws> --cluster <cluster-key> \
  --code "
df_lake = spark.table('default.default.customer')
df_ext  = spark.read.format('...').option('...', '...').load()   # recipe from the connector skill
df_lake.createOrReplaceTempView('lake')
df_ext.createOrReplaceTempView('ext')
spark.sql('''SELECT ... FROM lake l JOIN ext e ON l.key = e.key ...''').show(50, truncate=False)
"
```
Returns JSON {status, execution_count, outputs, spark_job_ids, error}. Read the SQL/Spark error from the error field and fix grounded in the catalog — don't guess repeatedly. Smoke-test connectivity first with --code "spark.sql('SELECT 1').show()".
Present the blended result; optionally persist as a table (aidp-ingest-file-to-table / manage-tables) or save the join via aidp-verified-queries.

Honesty / framing (no-fabrication)

Describe this as "federate in one Spark session" (read each source into Spark, then join) — that is literally what the helper does: every source read and the join run in the single Spark session of one aidp_sql.py invocation. Do not claim single-query pushdown federation across heterogeneous sources unless verified live on the target environment — that capability is an open question, so don't overstate it.
Keep all reads in one invocation/cell so they share the session; a second aidp_sql.py call is a fresh session and the earlier temp views are gone.
Mind volume: large external reads should be filtered/pushed down at the source where the connector supports it; otherwise sample. Ensure the cluster is RUNNING before executing (see aidp-cluster-ops).

References

spark-connectors plugin (source reads) · scripts/aidp_sql.py — bundled Spark-SQL executor
references/semantic-model.md · references/no-mcp-rest-map.md · pairs with aidp-analyzing-data

原文・著作権は Anthropic および各プラグイン作者に帰属します。日本語訳は Claude API による自動翻訳です。