スキルOfficialdevelopment

📚aidp-knowledge-bases

プラグイン: oracle-ai-data-platform-workbench-engineer-agent
ソース: GitHub で見る ↗

説明

RAG のための AIDP ナレッジベースを構築・管理します — カタログデータ上に KB を作成し、埋め込みモデルとチャンキング方式を選択、HNSW/IVF ベクトルインデックスを構築、データソース（ボリューム/テーブル）を追加、インジェスチョンジョブを実行、および KB のパーミッションを管理します。次のような場合に使用: ユーザーが RAG、ナレッジベース、ドキュメント/ベクトル検索、エンベディング、セマンティック検索を必要とする場合、または RAG_TOOL の agent-flow ノードにデータを供給したい場合。本プラグインはコーパス/インデックスレイヤーを担当します。構築した KB に対してクエリを実行する agent は、`aidp-agent-flows`（RAG_TOOL）または `aidp-agent-highcode` 上で作成します。

原文を表示

Build and manage AIDP Knowledge Bases for RAG — create a KB over catalog data, pick an embedding model + chunking, build an HNSW/IVF vector index, add data sources (volume/table), run ingestion jobs, and manage KB permissions. Use when the user wants RAG, a knowledge base, document/vector search, embeddings, semantic retrieval, or to feed a RAG_TOOL agent-flow node. This is the corpus/index layer; the agent that queries it is authored in aidp-agent-flows (RAG_TOOL) or aidp-agent-highcode.

ユースケース

✓RAG システムのためのナレッジベースを構築する
✓ドキュメント/ベクトル検索を実行するとき
✓セマンティック検索を行うとき
✓RAG_TOOL にデータを供給するとき
✓ベクトルインデックスを管理する

本文（日本語訳）

`aidp-knowledge-bases` — RAG コーパス + ベクターインデックス

Knowledge Base は AIDP が管理する RAG ストアです。ソースデータをベクターインデックスに埋め込み、直接、またはエージェントフロー内の RAG_TOOL ノードを経由して検索できます。LA AgentFlows ファミリー REST API 上で動作します。

エンジン: oci raw-request --profile DEFAULT（v1.0.0 では KB 用の CLI グループなし）。Lake スコープ。 2026-06-10 実機検証済み: GET …/dataLakes/<ocid>/knowledgeBases?catalogKey=<key>&schemaKey=<key> → 400 InvalidParameter はルートが存在するものの、実際の catalog/schema キーが必要であることを意味します（aidp-catalog-explore で解決してください）。ワークスペーススコープのパスは 404 になります。書き込み操作の前に必ず実機で読み取りを確認し、references/rest-endpoint-map.md に記録してください。

2026-06-10 de-agent 実機検証・訂正: クエリ文字列なしの素の GET …/knowledgeBases は 400 InvalidParameter を返し、 schemaKey と catalogKey の両方のクエリパラメーターが必要です。ルート自体はプロビジョニング済み（Lake スコープ）であり、400 の原因はルートの欠如ではなくパラメーターの欠如です。

次のような場合に使用

「RAG / Knowledge Base / ベクターインデックスを構築したい」「これらのドキュメント/テーブルを埋め込みたい」「X に対してセマンティック検索を行いたい」、またはエージェントフローの RAG_TOOL ノードのための前処理が必要な場合。
対象外: アドホックな LLM-in-SQL（→ aidp-ai-sql）、KB を利用する側のエージェント（→ aidp-agent-flows / aidp-agent-highcode）。

KB の作成（`CreateKnowledgeBaseDetails`、ワイヤーフィールドは camelCase）

POST …/dataLakes/<ocid>/knowledgeBases

{
  "displayName": "policy_kb",
  "description": "RAG over policy docs",
  "catalogKey": "<catalog-key>", "schemaKey": "<schema-key>",
  "workspaceKey": "<ws-key>", "clusterKey": "<cluster-key>",
  "type": "...", "modality": "...",
  "embeddingModelSourceType": "...", "embeddingModelName": "<embedding-model>",
  "chunkSize": 512, "chunkOverlap": 64,
  "sourceFilePattern": "*.pdf",
  "indexDetails": { "type": "HNSW", "distance": "COSINE", "neighbors": 32, "efConstruction": 200, "targetAccuracy": 95 }
}

indexDetails.type は HNSW | IVF のいずれか。両方のインデックスタイプで、以下の 7 値からなる distance 列挙型を使用できます: COSINE | DOT | EUCLIDEAN | HAMMING | JACCARD | L2_SQUARED | MANHATTAN （SDK: KbVHnswIndexDetails / KbVIvfIndexDetails の distance セッターの allowed_values — kb_v_hnsw_index_details.py:110、kb_v_ivf_index_details.py:110）。以下のチューニング値はあくまで例示です。

HNSW チューニングパラメーター（SDK: KbVHnswIndexDetails、ワイヤーフィールドは camelCase、すべて int — kb_v_hnsw_index_details.py:74-79）:

フィールド	意味
`neighbors`	各ベクターが任意のレイヤーで持てる最大近傍数（HNSW の M パラメーター） — `:146`
`efConstruction`	インデックス構築時に検討する最近傍候補の最大数 — `:170`
`targetAccuracy`	目標精度（1〜100 のパーセンテージ） — `:122`

IVF チューニングパラメーター（SDK: KbVIvfIndexDetails、ワイヤーフィールドは camelCase、すべて int — kb_v_ivf_index_details.py:74-79）:

フィールド	意味
`neighborPartitions`	ベクターデータを分割するパーティション（クラスター）数 — `:146`
`neighborPartitionProbes`	検索時に探索する最大パーティション数（大きいほど精度向上・速度低下） — `:170`
`targetAccuracy`	目標精度（1〜100 のパーセンテージ） — `:122`

embeddingModelName / SourceType — 利用可能な埋め込みモデルは aidp-models-catalog（modelType=EMBEDDING）で一覧取得してください。埋め込み計算には RUNNING 状態の clusterKey が必要です。
事前検証について: 上記フィールド名は SDK の CreateKnowledgeBaseDetails に基づくものです。create 操作は埋め込み計算を起動するため、ラウンドトリップ検証は未実施です。本番環境での作成前に、埋め込みモデルおよび type / modality の列挙値を実機読み取りまたは aidp help で必ず確認してください。値を推測して使用しないでください。

インジェスト + メンテナンス

操作	エンドポイント / ボディ
データソースの追加/削除	`UpdateKnowledgeBaseAddSourceDetails` / `…DeleteSourceDetails`（ソース種別: volume / table）
インジェストジョブの実行	`POST …/knowledgeBases/<key>/jobs` — `CreateKnowledgeBaseJobDetails {displayName, type, goal, sources, sourceKey, schedule}`; `…/jobs/<key>/jobRuns` でジョブランをトリガー
ジョブランの一覧取得 / ステータス確認	`GET …/knowledgeBases/<key>/jobs/<key>/jobRuns`
権限管理	KnowledgeBasePermission の `assign` / `manage` / `revoke`（プリンシパルは `aidp-roles-access` を参照）
KB の更新 / 削除	`PUT` / `DELETE …/knowledgeBases/<key>`

エージェントとの連携

インデックスが構築されたら、RAG_TOOL ノード（aidp-agent-flows、references/agent-flow-nodes.md 参照）またはハイコード（aidp-agent-highcode）から KB を参照してください。RAG ツールが検索を行うには、KB が存在し、かつインジェスト済みである必要があります。

ガードレール

ミューテーションゲート: KB の作成・インジェストは埋め込み計算リソースを消費します。リクエストボディを提示して事前確認を取り、.aidp/payloads/ に保存してください（references/payloads.md）。
catalogKey / schemaKey / clusterKey（実際のキー）は、事前に aidp-catalog-explore / aidp-cluster-ops で解決してください。

参考情報

aidp-agent-flows（RAG_TOOL コンシューマー） · aidp-models-catalog（埋め込みモデル） · aidp-catalog-explore（キーの解決）
references/oci-raw-request.md · references/rest-endpoint-map.md · references/payloads.md

原文（English）を表示

`aidp-knowledge-bases` — RAG corpus + vector index

A Knowledge Base is AIDP's managed RAG store: it embeds source data into a vector index you can retrieve from (directly, or via a RAG_TOOL node in an agent flow). Runs over the LA AgentFlows family REST API.

Engine: oci raw-request --profile DEFAULT (no CLI group for KB in v1.0.0). Lake-scoped, live-verified 2026-06-10: GET …/dataLakes/<ocid>/knowledgeBases?catalogKey=<key>&schemaKey=<key> → 400 InvalidParameter means the route exists but needs real catalog/schema keys (resolve via aidp-catalog-explore); the workspace-scoped path 404s. Confirm with a live read before any write; record in references/rest-endpoint-map.md.

Live-verified 2026-06-10 on de-agent — correction: a bare GET …/knowledgeBases (no query string) returns 400 InvalidParameter requiring both schemaKey and catalogKey query params — the route is provisioned (lake-scoped), it is the missing params that 400, not a missing route.

When to use

"Build a RAG / knowledge base / vector index", "embed these docs/tables", "semantic search over X", or any prerequisite for a RAG_TOOL agent-flow node.
NOT ad-hoc LLM-in-SQL (→ aidp-ai-sql); NOT the agent that uses the KB (→ aidp-agent-flows / aidp-agent-highcode).

Create a KB (`CreateKnowledgeBaseDetails`, camelCase wire fields)

POST …/dataLakes/<ocid>/knowledgeBases

{
  "displayName": "policy_kb",
  "description": "RAG over policy docs",
  "catalogKey": "<catalog-key>", "schemaKey": "<schema-key>",
  "workspaceKey": "<ws-key>", "clusterKey": "<cluster-key>",
  "type": "...", "modality": "...",
  "embeddingModelSourceType": "...", "embeddingModelName": "<embedding-model>",
  "chunkSize": 512, "chunkOverlap": 64,
  "sourceFilePattern": "*.pdf",
  "indexDetails": { "type": "HNSW", "distance": "COSINE", "neighbors": 32, "efConstruction": 200, "targetAccuracy": 95 }
}

indexDetails.type ∈ HNSW | IVF. Both index types accept the full 7-value distance enum: COSINE | DOT | EUCLIDEAN | HAMMING | JACCARD | L2_SQUARED | MANHATTAN (SDK KbVHnswIndexDetails / KbVIvfIndexDetails distance setter allowed_values — kb_v_hnsw_index_details.py:110, kb_v_ivf_index_details.py:110). Tuning values below are illustrative.

HNSW tuning params (SDK KbVHnswIndexDetails, camelCase wire fields, all int — kb_v_hnsw_index_details.py:74-79):

Field	Meaning
`neighbors`	max neighbors each vector can have on any layer (the HNSW M parameter) — `:146`
`efConstruction`	max closest-vector candidates considered during index construction — `:170`
`targetAccuracy`	target accuracy percentage 1–100 — `:122`

IVF tuning params (SDK KbVIvfIndexDetails, camelCase wire fields, all int — kb_v_ivf_index_details.py:74-79):

Field	Meaning
`neighborPartitions`	number of partitions (clusters) to divide the vector data into — `:146`
`neighborPartitionProbes`	max partitions to probe during a search (higher = more accurate, slower) — `:170`
`targetAccuracy`	target accuracy percentage 1–100 — `:122`

embeddingModelName/SourceType — list available embedding models via aidp-models-catalog (modelType=EMBEDDING); needs a RUNNING clusterKey for embedding compute.
Verify-first: the field names above are from the SDK CreateKnowledgeBaseDetails; the create was not round-tripped (it triggers embedding compute). Confirm the embedding-model + type/modality enums against a live read / aidp help before a production create — do not invent values.

Ingest + maintain

Action	Endpoint / body
Add/remove a data source	`UpdateKnowledgeBaseAddSourceDetails` / `…DeleteSourceDetails` (source kind volume / table)
Run an ingestion job	`POST …/knowledgeBases/<key>/jobs` — `CreateKnowledgeBaseJobDetails {displayName, type, goal, sources, sourceKey, schedule}`; trigger runs via `…/jobs/<key>/jobRuns`
List job runs / status	`GET …/knowledgeBases/<key>/jobs/<key>/jobRuns`
Permissions	`assign`/`manage`/`revoke` KnowledgeBasePermission (`aidp-roles-access` for principals)
Update / delete KB	`PUT`/`DELETE …/knowledgeBases/<key>`

Wire it to an agent

Once the index is built, reference the KB from a RAG_TOOL node (aidp-agent-flows, references/agent-flow-nodes.md) or from high-code (aidp-agent-highcode). The KB must exist + be ingested before the RAG tool can retrieve.

Guardrails

Mutation gate: KB create/ingest consumes embedding compute — show the body, confirm first, persist to .aidp/payloads/ (references/payloads.md).
Resolve catalogKey/schemaKey/clusterKey (real keys) first via aidp-catalog-explore / aidp-cluster-ops.

References

aidp-agent-flows (RAG_TOOL consumer) · aidp-models-catalog (embedding models) · aidp-catalog-explore (keys)
references/oci-raw-request.md · references/rest-endpoint-map.md · references/payloads.md

原文・著作権は Anthropic および各プラグイン作者に帰属します。日本語訳は Claude API による自動翻訳です。