スキルOfficialdevelopment

📥aidp-ingest-file-to-table

プラグイン: oracle-ai-data-platform-workbench-engineer-agent
ソース: GitHub で見る ↗

説明

データファイル（CSV / JSON / Parquet など）を管理対象の AIDP Delta テーブルに読み込みます。次のような場合に使用: ファイルをテーブルに取り込みたい、ファイルからテーブルを作成したい、またはレイクハウスにローデータを格納したい場合。 1ステップのパスと、アップロード → スキーマ推論 → 作成の3ステップパスの両方をサポートします。コントロールプレーンの操作には公式の `aidp` CLI を使用します。

原文を表示

Load a data file (CSV/JSON/Parquet/etc.) into a managed AIDP Delta table. Use when the user wants to ingest a file into a table, create a table from a file, or land raw data in the lakehouse. Supports the 1-step path and the 3-step upload→infer→create path. Control-plane via the official `aidp` CLI.

ユースケース

✓ファイルをテーブルに取り込みたい
✓ファイルからテーブルを作成したい
✓レイクハウスにローデータを格納したい

本文（日本語訳）

`aidp-ingest-file-to-table` — ファイル → マネージド Delta テーブル

ファイルをマネージド AIDP テーブルに取り込みます。 1回の呼び出しで完結させる方法と、推論されたスキーマをレビュー・調整したい場合の3ステップのステージングフローの、いずれかを選択できます。これは DataLake の schema/tables リソースに対する コントロールプレーン フローです。 主エンジン: 公式 Oracle aidp CLI（同一の REST API + 認証）。 CLI がインストールされていない場合のフォールバックとして oci raw-request を使用します。

次のような場合に使用

「この CSV/JSON をテーブルにロードしたい」「<ファイル> からテーブルを作成したい」「<ファイル> をレイクハウスに取り込みたい」

CLI（推奨）

references/aidp-cli-map.md に基づく手順: schema generate-temp-file-upload-target → schema infer / infer-with-preview → schema create-data-table / create-table（schema retrieve-par も使用可）

すべてのコマンドに --instance-id <DATALAKE_OCID> --auth api_key --profile DEFAULT --region <r> を指定します。

# 3ステップ（コントロールフロー）: ステージング → 推論 → 作成
aidp schema generate-temp-file-upload-target --instance-id <DATALAKE_OCID> --auth api_key --profile DEFAULT --region us-ashburn-1   # アップロードターゲット / PAR を返す（retrieve-par も使用可）
aidp schema infer-with-preview              --instance-id <DATALAKE_OCID> --auth api_key --profile DEFAULT --region us-ashburn-1   # カラム / 型 / プレビューをレビュー（infer も使用可）
aidp schema create-data-table --body-file .aidp/payloads/create-data-table-<name>.json \
  --instance-id <DATALAKE_OCID> --auth api_key --profile DEFAULT --region us-ashburn-1                                            # create-table でも可

ミューテーション操作（create-data-table / create-table、アップロード）: 実行前にボディを .aidp/payloads/ に保存し、ユーザーに確認を取ってから実行してください（references/payloads.md 参照）。

フォールバック（CLI なし） — oci raw-request 経由で同一の REST + 認証を使用し、 …/20240831/dataLakes/<OCID>/… に対してリクエストします（認証ラダーは references/oci-raw-request.md 参照）:

POST /tables/actions/uploadDataFile（マルチパート/バイナリは PAR アップロードが必要な場合あり — aidp-volumes 参照）
POST /tables/actions/inferSchema
POST /tables/actions/createTable（catalogKey、schemaKey、テーブル名、確定カラム、ソースフォーマット、ロードオプションを指定）
検証: GET /tables?catalogKey=<cat>&schemaKey=<cat.schema>

事前検証（情報の捏造禁止）: アップロード / 推論 / 作成の各アクションのシェイプは、この環境では 未検証 です（references/rest-endpoint-map.md に未記載）。書き込みを行う前に、ライブプローブで確認してください（まず対象スキーマに対して GET /tables?catalogKey=…&schemaKey=… が 200 を返すことを確認）。結果は記録してください。

2026-06-10 に de-agent 上でライブ検証済み（CSV → de_ingest_test、3行）— 訂正: 上記の uploadDataFile / inferSchema / createTable というアクション名は 誤りです。正しい動作フローは schema リソースの3ステップです:

generate-temp-file-upload-target が PAR と ociFilePath を返す

PAR にファイルのバイト列を PUT する（HTTP 200）

infer-with-preview — location には ociFilePath の OCI URI を指定すること（uploadKey は不可。uploadKey を渡すと 400 エラー）

create-data-table は 202 + datalake-async-operation-key を返す（SUCCEEDED になるまでポーリング）

create-data-table はヘッダーなし / 位置ベース: 作成時に header=true は無視されるため、tableFields にはリーダーのカラム名（_c0 / _c1 / _c2 …）を使用してください。 id / name / amt のような名前を付けると、UNRESOLVED_COLUMN エラーで非同期ジョブが失敗します。リネームは後から ALTER TABLE … RENAME COLUMN で行ってください。

ワークフロー

ソースファイルの場所（ワークスペースパスまたはボリューム）と、ターゲットの catalog.schema.table を確認します（必要に応じてスキーマを先に作成してください）。
1ステップ（シンプル）: aidp schema create-table でソースファイル・フォーマット・オプションを指定 — スキーマが問題なく推論できる場合に最速の方法です。
3ステップ（コントロール）: generate-temp-file-upload-target → infer-with-preview（カラム / 型をユーザーとレビューし、型・ヘッダー・デリミタを修正）→ 確定したカラムで create-data-table を実行。
非同期: テーブル作成は 202 と非同期操作キーを返す場合があります — 終端状態になるまでポーリングしてください（非同期の規約は references/oci-raw-request.md 参照。aidp-observability で追跡可能）。
aidp schema list-tables / GET /tables?… で検証し、完全修飾テーブル名と行数 / カラム数のサマリーを報告してください。

既知の制限事項（ワークアラウンドなし）

区切り文字付きファイル: カンマのみ対応 — 「カンマ以外の区切り文字はサポートされない」（プラットフォームリファレンス §42 既知の問題 #15）。タブ / パイプ / セミコロン区切りのファイルは、取り込み前に CSV に変換してください。
外部テーブルへの複数行 JSON は不可 — 「複数行 JSON では外部テーブルを作成できない」（プラットフォームリファレンス §42 既知の問題 #12）。外部テーブルには改行区切り JSON（1行1レコード）を使用してください。

注意事項

大きなファイルの場合: ボリュームまたはオブジェクトストレージに一度格納してからロードする方法を推奨します。クラスターのメモリに注意してください。
継続的 / ストリーミング取り込みや外部ソースからの取り込みには、このスキルではなく spark-connectors plugin と aidp-federate を使用してください（このスキルはファイル→テーブル専用です）。
検証中に作成した一時テーブルは必ず削除してください。

参照

references/aidp-cli-map.md · references/payloads.md · references/oci-raw-request.md · references/rest-endpoint-map.md
関連スキル: aidp-workspace-files、aidp-volumes、aidp-profiling-tables

原文（English）を表示

`aidp-ingest-file-to-table` — file → managed Delta table

Land a file into a managed AIDP table, either in one call or via the staged 3-step flow when you need to review/adjust the inferred schema. This is a control-plane flow on the DataLake schema/tables resource. Primary engine: the official Oracle aidp CLI (same REST API + auth); oci raw-request is the fallback when the CLI isn't installed.

When to use

"Load this CSV/JSON into a table", "create a table from <file>", "ingest <file> into the lakehouse".

CLI (preferred)

Per references/aidp-cli-map.md: schema generate-temp-file-upload-target → schema infer / infer-with-preview → schema create-data-table / create-table (also schema retrieve-par). All commands take --instance-id <DATALAKE_OCID> --auth api_key --profile DEFAULT --region <r>.

# 3-step (control): stage → infer → create
aidp schema generate-temp-file-upload-target --instance-id <DATALAKE_OCID> --auth api_key --profile DEFAULT --region us-ashburn-1   # returns upload target / PAR (also: retrieve-par)
aidp schema infer-with-preview              --instance-id <DATALAKE_OCID> --auth api_key --profile DEFAULT --region us-ashburn-1   # review columns/types/preview (or: infer)
aidp schema create-data-table --body-file .aidp/payloads/create-data-table-<name>.json \
  --instance-id <DATALAKE_OCID> --auth api_key --profile DEFAULT --region us-ashburn-1                                            # or: create-table

Mutating ops (create-data-table/create-table, upload): persist the body to .aidp/payloads/ and confirm with the user before running (see references/payloads.md).

Fallback (no CLI) — same REST + auth via oci raw-request against …/20240831/dataLakes/<OCID>/… (auth ladder in references/oci-raw-request.md): POST /tables/actions/uploadDataFile (multipart/binary may need PAR upload — see aidp-volumes), POST /tables/actions/inferSchema, POST /tables/actions/createTable (with catalogKey, schemaKey, table name, finalized columns, source format, load options), verify GET /tables?catalogKey=<cat>&schemaKey=<cat.schema>.

Verify-first (no-fabrication): the upload/infer/create action shapes are UNVERIFIED in this env (not yet in references/rest-endpoint-map.md). Confirm with a live probe (start with a GET /tables?catalogKey=…&schemaKey=… 200 against the target schema) before any write; record results.

Live-verified 2026-06-10 on de-agent (CSV → de_ingest_test, 3 rows) — correction: the uploadDataFile / inferSchema / createTable action names above are WRONG. The working flow is the schema-resource 3-step: (1) generate-temp-file-upload-target returns a PAR + ociFilePath; (2) PUT the file bytes to the PAR (HTTP 200); (3) infer-with-preview — its location MUST be the ociFilePath OCI URI, not the uploadKey (passing uploadKey → 400); (4) create-data-table returns 202 + a datalake-async-operation-key (poll to SUCCEEDED). create-data-table is HEADERLESS/POSITIONAL: header=true is ignored at create, so tableFields must use the reader column names _c0/_c1/_c2… — naming them id/name/amt fails the async op with UNRESOLVED_COLUMN. Rename afterward via ALTER TABLE … RENAME COLUMN.

Workflow

Confirm the source file location (workspace path or volume) and the target catalog.schema.table (create the schema first if needed).
1-step (simple): aidp schema create-table referencing the source file, format, and options — fastest when the schema infers cleanly.
3-step (control): generate-temp-file-upload-target → infer-with-preview (review columns/types with the user; fix types/headers/delimiters) → create-data-table with the finalized columns.
Async: table creation may return 202 with an async-operation key — poll until terminal (async convention in references/oci-raw-request.md; track via aidp-observability).
Verify with aidp schema list-tables / GET /tables?…; report the fully-qualified table name and row/column summary.

Gotchas (documented limits, no workaround)

Delimited files: comma only — auto-populate "Doesn't support delimiters other than comma" (platform reference §42 Known Issues #15). Pre-convert tab/pipe/semicolon-delimited files to CSV before ingest.
No multi-line JSON for external tables — "Can't create external tables with multi-line JSON" (platform reference §42 Known Issues #12). Use newline-delimited JSON (one record per line) for external tables.

Notes

Big files: prefer landing into a volume / object storage and loading from there; mind cluster memory.
For continuous/streaming or external-source ingestion, use the spark-connectors plugin + aidp-federate, not this skill (this is file→table).
Clean up temporary tables created during validation.

References

references/aidp-cli-map.md · references/payloads.md · references/oci-raw-request.md · references/rest-endpoint-map.md
pairs with aidp-workspace-files, aidp-volumes, aidp-profiling-tables

原文・著作権は Anthropic および各プラグイン作者に帰属します。日本語訳は Claude API による自動翻訳です。