スキルOfficialdevelopment

🚀model-deployment

プラグイン: sagemaker-ai
ソース: GitHub で見る ↗

説明

SageMaker Serverless Model Customizationでファインチューニングされたモデルを、SageMakerエンドポイントまたはBedrockにデプロイするコードを生成します。次のような場合に使用: ユーザーが「モデルをデプロイしたい」「エンドポイントを作成したい」「利用可能な状態にしたい」と言った場合、またはデプロイオプションについて質問した場合。適切なデプロイパスウェイ（NovaまたはOSS）を特定し、デプロイ用コードを生成して、エンドポイントの設定を処理します。

原文を表示

Generates code that deploys fine-tuned models from SageMaker Serverless Model Customization to SageMaker endpoints or Bedrock. Use when the user says "deploy my model", "create an endpoint", "make it available", or asks about deployment options. Identifies the correct deployment pathway (Nova vs OSS), generates deployment code, and handles endpoint configuration.

ユースケース

✓ファインチューニング済みモデルをデプロイしたい
✓SageMakerエンドポイントを作成したい
✓Bedrockにモデルをデプロイしたい
✓デプロイオプションについて質問するとき

本文（日本語訳）

モデルデプロイ

モデルの特性に基づいて適切なデプロイパスを特定し、デプロイ用コードを生成します。

スコープ

このスキルは、SageMaker Serverless Model Customization を通じてファインチューニングされた Nova モデルおよび OSS モデルのデプロイのみをサポートします。

非対応:

ベースモデル（ファインチューニングなし）
他のプロセスでファインチューニングされたモデル
Full Fine-Tuning（FFT）— LoRA でファインチューニングされたモデルのみ対応

前提条件

SDK 環境が確認済みであること（SDK バージョン、リージョン、実行ロール）。未確認の場合は、先に sdk-getting-started スキルを有効化してください。

原則

一度に一つ。 各レスポンスで意思決定を一つだけ進める。
進める前に確認する。 次のステップへ移る前に、ユーザーの同意を待つ。ただし、会話の中ですでに回答済みの質問は再度聞かない — 既知の情報を活用する。
必要になるまでファイルを読まない。 パスが確定してから、初めてパスの参照ファイルを読む。
既知の情報を活用する。 会話履歴やアーティファクトで回答できる場合は、再度質問せず、理解内容を確認する形にとどめる。

ワークフロー

ステップ 1: トレーニングジョブを特定する

トレーニングジョブ名または ARN が必要です。まず会話履歴を確認してください — ユーザーがすでに言及している場合や、ワークフローの前のステップ（例: ファインチューニング）から取得できる場合があります。見つからない場合は、ユーザーに確認します。

トレーニングジョブ名または ARN が確認できたら、AWS MCP ツールで検索します:

AWS MCP ツールの describe-training-job を使用し、以下を抽出する:
- S3 出力パス（ModelArtifacts.S3ModelArtifacts または OutputDataConfig.S3OutputPath から）
- IAM ロール ARN（RoleArn から）
- リージョン
トレーニングジョブ ARN に対して AWS MCP ツールの list-tags を使用し、以下を抽出する:
- sagemaker-studio:jumpstart-model-id タグから モデル ID
モデル ID から モデルタイプ を判定する:
- "nova" を含む（nova-micro、nova-lite、nova-pro）→ Nova
- Llama、Mistral、Qwen、GPT-OSS、DeepSeek など → OSS

非対応モデル: このスキルは、SageMaker Serverless Model Customization で LoRA ファインチューニングされた OSS モデルおよび Nova モデルのみをサポートします。モデルが該当しない場合は、このスキルでは対応できない旨をユーザーに伝え、ファインチューニングスキルを案内してください。

ステップ 2: デプロイ先の候補を決定する

以下の表を参照してください:

モデルタイプ	対応デプロイ先
OSS	SageMaker、Bedrock
Nova	SageMaker、Bedrock

候補が一つのみの場合は、ユーザーに確認します。詳細はステップ 5 を参照してください。

候補が複数ある場合は、ユーザーの選択を支援します。詳細はステップ 5 を参照してください。

候補がない場合は、その旨とその理由をユーザーに説明します。

ステップ 3: デプロイ先をユーザーに選択させる

候補となるオプションをユーザーに提示します。 SageMaker と Bedrock の両方が選択肢にある場合は、以下の詳細を提示して判断を助けてください:

SageMaker エンドポイント:

安定したパフォーマンスのための専用コンピューティングリソース
インスタンスタイプとスケーリングを制御可能
レイテンシ要件が明確な予測可能なワークロードに最適

Bedrock:

フルマネージドのサーバーレス推論
キャパシティ計画不要で即時オートスケール
リクエスト単位の従量課金
需要が変動するワークロードに最適

推奨は行わないでください。ユーザー自身に選択させてください。

マージ済み／未マージの重みや参照ファイル、API などの技術的詳細は、ユーザーが尋ねない限り言及しないでください。

⏸ ユーザーがデプロイ先を選択するまで待機する。

ステップ 4: ライセンス同意を表示する

デプロイへ進む前に、モデルのライセンスまたはサービス利用規約をユーザーに表示します。

references/model-licenses.md を読み込み、ステップ 1 で確認したモデル ID でモデルを検索する。
Notes 列の指示に従い、記載された文言をそのまま使用する。
モデル ID がテーブルに見つからない場合は、ライセンス情報を確認できなかった旨をユーザーに警告し、進める前に独自にライセンスを確認するよう推奨する。

⏸ ユーザーが確認するまで待機する。

ステップ 5: パスのワークフローに従う

選択されたパスの参照ファイルを読み込み、その指示に従ってください。

モデルタイプ	デプロイ先	参照ファイル
OSS	SageMaker	`references/deploy-oss-sagemaker.md`
OSS	Bedrock	`references/deploy-oss-bedrock.md`
Nova	SageMaker	`references/deploy-nova-sagemaker.md`
Nova	Bedrock	`references/deploy-nova-bedrock.md`

ステップ 6: デプロイ後のサマリー

デプロイ完了後、ステップ 5 で参照したパスのリファレンスドキュメントの内容をもとに、以下の項目をユーザーに提示してください:

デプロイ内容 — エンドポイントまたはモデル名、ARN、ステータス
使用方法 — デプロイ先に応じた呼び出しサンプルコード
コスト — 課金モデル（インスタンス課金 vs. リクエスト従量課金）と想定される費用感
クリーンアップ — 不要になったエンドポイントやモデルの削除方法

トラブルシューティング

モデルが LoRA と FFT のどちらでファインチューニングされたかを確認する方法

デプロイが予期せず失敗した場合、モデルが LoRA ではなく Full Fine-Tuning（FFT）でファインチューニングされている可能性があります。確認するには、トレーニングジョブの S3 出力パスにある .hydra/config.yaml から hydra config をダウンロードしてください:

peft_config に値が設定されている（r、alpha、dropout など）→ LoRA（対応）
peft_config: null → FFT（このスキルでは非対応）

原文（English）を表示

Model Deployment

Identifies the correct deployment pathway based on model characteristics and generates deployment code.

Scope

This skill supports deploying Nova and OSS models that were fine-tuned through SageMaker Serverless Model Customization only.

Not supported:

Base models (not fine-tuned)
Models fine-tuned through other processes
Full Fine-Tuning (FFT) — only LoRA fine-tuned models are supported

Prerequisites

The SDK environment has been verified (SDK version, region, execution role). If not done, activate the sdk-getting-started skill first.

Principles

One thing at a time. Each response advances exactly one decision.
Confirm before proceeding. Wait for the user to agree before moving on. But don't re-ask questions already answered in the conversation — use what you know.
Don't read files until you need them. Only read pathway references after the pathway is confirmed.
Use what you know. If conversation history or artifacts already answer a question, confirm your understanding instead of asking again.

Workflow

Step 1: Identify the Training Job

You need the training job name or ARN. Check the conversation history first — the user may have already mentioned it, or it may be available from earlier steps in the workflow (e.g., fine-tuning). If not, ask the user.

Once you have the training job name or ARN, use the AWS MCP tool to look it up:

Use the AWS MCP tool describe-training-job and extract:
- S3 output path (from ModelArtifacts.S3ModelArtifacts or OutputDataConfig.S3OutputPath)
- IAM role ARN (from RoleArn)
- Region
Use the AWS MCP tool list-tags on the training job ARN and extract:
- Model ID from the sagemaker-studio:jumpstart-model-id tag
Determine the model type from the model ID:
- Contains "nova" (nova-micro, nova-lite, nova-pro) → Nova
- Llama, Mistral, Qwen, GPT-OSS, DeepSeek, etc. → OSS

Unsupported models: This skill only supports OSS and Nova models that were LoRA fine-tuned through SageMaker Serverless Model Customization. If the model doesn't match, tell the user this skill can't help and suggest the finetuning skill.

Step 2: Determine Eligible Deployment Targets

Use the following table:

Model Type	Eligible Targets
OSS	SageMaker, Bedrock
Nova	SageMaker, Bedrock

If only one target is eligible, confirm it with the user. Use details from Step 5.

If multiple targets are eligible, help the user decide. Use details from Step 5.

If no targets are eligible, tell the user and explain why.

Step 3: Let the User Choose a Deployment Target

Present the eligible options to the user. Present these details to help them decide between SageMaker and Bedrock, if both are available options:

SageMaker Endpoint:

Dedicated compute resources for consistent performance
Control instance types and scaling
Best for predictable workloads with specific latency requirements

Bedrock:

Fully managed serverless inference
Auto-scales instantly with no capacity planning
Pay per request
Best for variable workloads with fluctuating demand

Do NOT make a recommendation. Let the user choose.

Do NOT mention technical details like merged/unmerged weights, reference files, or APIs, unless the user asks.

⏸ Wait for user to select a deployment option.

Step 4: Display License Agreement

Before proceeding to deployment, display the model's license or service terms to the user.

Read references/model-licenses.md and look up the model by its model ID (determined in Step 1).
Follow the instructions in the Notes column — use the exact phrasing provided.
If the model ID is not found in the table, warn the user that you could not find license information for their model and recommend they verify the license independently before proceeding.

⏸ Wait for the user to confirm before proceeding.

Step 5: Follow Pathway Workflow

Read the reference file for the selected pathway and follow its instructions.

Model Type	Deployment Target	Reference
OSS	SageMaker	`references/deploy-oss-sagemaker.md`
OSS	Bedrock	`references/deploy-oss-bedrock.md`
Nova	SageMaker	`references/deploy-nova-sagemaker.md`
Nova	Bedrock	`references/deploy-nova-bedrock.md`

Step 6: Post-Deployment Summary

After deployment completes, provide the user with a summary. Cover these topics, using details from the pathway reference doc you followed in Step 5:

What was deployed — endpoint or model name, ARN, status
How to use it — sample invoke code for the specific deployment target
Cost — billing model (instance-based vs. pay-per-request) and what to expect
Cleanup — how to delete the endpoint or model when done

Troubleshooting

How to check if a model was LoRA or FFT fine-tuned

If deployment fails unexpectedly, the model may have been full fine-tuned (FFT) rather than LoRA. To check, download the training job's hydra config from its S3 output path at .hydra/config.yaml:

peft_config populated (r, alpha, dropout, etc.) → LoRA (supported)
peft_config: null → FFT (not supported by this skill)

原文・著作権は Anthropic および各プラグイン作者に帰属します。日本語訳は Claude API による自動翻訳です。