スキルOfficialdevelopment

🎯finetuning

プラグイン: sagemaker-ai
ソース: GitHub で見る ↗

説明

SageMakerのサーバーレストレーニングジョブを使用して、ベースモデルをファインチューニングするコードを生成します。次のような場合に使用: ユーザーが「トレーニングを開始する」「モデルをファインチューニングしたい」「トレーニングの準備ができた」と言ったとき、またはプランがファインチューニングのステップに達したとき。 SFT、DPO、RLVR、RLAIFの各トレーナーに対応しており、RLVRのLambdaリワード関数およびRLAIFのカスタムプロンプト作成もサポートしています。

原文を表示

Generates code that fine-tunes a base model using SageMaker serverless training jobs. Use when the user says "start training", "fine-tune my model", "I'm ready to train", or when the plan reaches the finetuning step. Supports SFT, DPO, RLVR, and RLAIF trainers, including RLVR Lambda reward function and RLAIF custom prompt creation.

ユースケース

✓ベースモデルをファインチューニングしたいとき
✓SageMakerでトレーニングジョブを開始するとき
✓SFT、DPO、RLVRなどのトレーニング実行時
✓ファインチューニングのステップに到達したとき

本文（日本語訳）

前提条件

このワークフローを開始する前に、以下を確認してください:

use_case_spec.md ファイルが存在すること
- 存在しない場合: 先に use-case-specification スキルを起動してから再開してください
- use-case-specification スキルを起動せずに、ユースケース仕様の作成を提案してはいけません。
ファインチューニング手法（SFT、DPO、RLVR、RLAIF、またはCPT/RFT（Nova向け））とベースモデルがすでに選択されていること
- 不足している場合: model-selection および/または finetuning-technique スキルを起動して不足情報を収集し、その後再開してください
- その場で推奨事項を提示してはいけません。必ず適切なスキルを起動してください。
SageMakerHub で利用可能なベースモデル名が特定されていること
- 特定されていない場合: model-selection スキルを起動して取得してください
- 重要: model-selection が取得したモデル名のみを使用してください。同じモデルの一般的な名称とは異なる場合があります。
SDK 環境が検証済みであること（SDKバージョン、リージョン、実行ロール）
- 未実施の場合: 先に sdk-getting-started スキルを起動してから再開してください
トレーニングデータセットが、環境のデフォルトリージョンのバケットにアップロードされていること
- 未完了の場合: ユーザーが正しい S3 バケットにデータセットをアップロードできるよう支援してください

重要ルール

コード生成ルール

✅ 各コードテンプレートに示されているインポートをそのまま使用する
❌ 有用に見えても、追加のインポートを加えない
❌ そのセクションで必要になる前に変数を作成しない
📋 コードの構造を正確にコピーする — 独自の改変は行わない
🎯 最小限のコードという原則を厳守する
✅ コードを記述する際は、インデントと f 文字列が正しいことを確認する

ユーザーとのコミュニケーションルール

❌ トレーニング中に下流のスキルへの移行を提案しない（論理的に不可能なため）
❌ 会話内でユーザーが明示的に確認するまで、ACCEPT_EULA を True に設定しない
✅ 参照するセクションは、番号とタイトルの両方を必ず明記する
✅ 実行方法をユーザーに尋ねられた場合（ノートブック）: run_cell が利用可能な場合は実行を提案する。利用できない場合は、セルを1つずつ実行するよう伝える（ipykernel が必要である旨も明記する）。
✅ 実行方法をユーザーに尋ねられた場合（スクリプト）: python3 <スクリプト名>.py で実行するよう伝える

ワークフロー

1. コード生成の準備

1.1 ディレクトリのセットアップ

会話のコンテキストからプロジェクトディレクトリを特定する
- 不明な場合（関連するディレクトリが複数存在する）→ どのフォルダを使用するかユーザーに確認する
- プロジェクトディレクトリが存在しない場合 → directory-management スキルを起動してセットアップする

⏸ ユーザーの応答を待つ。

1.2 コードテンプレートの選択

references/code_output_guide.md を読んで出力フォーマットのルールを確認し、次にファインチューニング手法に対応するコードテンプレートを読む:

SFT → code_templates/sft.py
DPO → code_templates/dpo.py
RLVR → code_templates/rlvr.py
組み込みリワードを使用する RLAIF → code_templates/rlaif_builtin.py
カスタムプロンプトを使用する RLAIF → code_templates/rlaif_custom_prompt.py

テンプレートは Python ファイルであり、各 # Cell N: Label コメントが新しいセクションの開始を示す。このマーカーで分割し、あるマーカーから次のマーカーまでの内容を1つの出力単位とする。

1.3 コードの生成

code_output_guide.md のルールに従い、テンプレートからコードを記述する
テンプレートと同じ順序、依存関係、インポートを使用する
独自の改変や追加コードの記述は行わない
モデルが Meta/Llama モデルでない場合（モデル ID が meta- で始まらない場合）:
- config セルから ACCEPT_EULA = False の行を省略する
- trainer の呼び出しから accept_eula=ACCEPT_EULA, の行を省略する
モデルが Nova ファミリーの場合、「Configure Trainer」セクションおよび「Hyperparameter Overrides」セクションから max_epochs または lr_warmup_steps_ratio を含むコードを省略する

1.4 設定値の自動生成

「Setup & Credentials」セルに以下を設定する:

BASE_MODEL
- コンテキストに記載された SageMakerHub の正確なモデル名を使用する
MODEL_PACKAGE_GROUP_NAME
- ユースケースから生成する（必要に応じて use_case_spec.md を参照する）
- フォーマットルール:
  - 小文字、英数字とハイフンのみ使用可能
  - 1〜63文字
  - パターン: [a-zA-Z0-9](-*[a-zA-Z0-9]){0,62}
  - 例: "Customer Support Chatbot" → customer-support-chatbot-v1
ノートブックを保存する

2. RLVR リワード関数（RLVR のみ。SFT または DPO の場合はこのセクションをスキップ）

2.1 リワード関数の状態確認

ユーザーがすでにリワード関数を持っているか、作成のサポートが必要かを確認する。
- すでに持っている場合 → SageMaker Hub の Evaluator ARN を確認する。有効な Evaluator ARN がユーザーから提供された場合のみ、セクション 2.3 に進む。SageMaker Hub Evaluator として登録されていない場合は、2.2 に進む。
- 持っていない場合 → 2.2 に進む

2.2 テンプレートからリワード関数を生成する

references/rlvr_reward_function.md の「Helping Users Create Custom Reward Functions」セクションのワークフローに従う

2.3 CUSTOM_REWARD_FUNCTION の値を設定する

ノートブック内の CUSTOM_REWARD_FUNCTION に、リワード関数の ARN を設定する（ユーザーから直接提供された ARN、または関数生成コードの evaluator.arn から取得した ARN）。

3. RLAIF（RLAIF のみ。RLAIF 以外の手法の場合はこのセクションをスキップ）

references/rlaif_guide.md を読み、その指示に従う。

4. EULA の確認と同意

references/eula_links.md から選択したベースモデルの公式ライセンスリンクを調べる
references/eula_links.md の表現に従い、ライセンスをユーザーに提示する。 OSS モデルの場合: 「このモデルは {License} の下でライセンスされています。ライセンス条項をこちらでご確認ください: {URL}」 Nova モデルの場合: 「このモデルは AWS サービス利用規約が適用されます: {URL}」
選択したベースモデルが Meta/Llama モデルかどうかを確認する（モデル ID が meta- で始まるかどうか）
- Meta/Llama の場合: このモデルを使用する前に EULA を読み、同意する必要があることをユーザーに伝える。「ライセンス条項に同意しますか？（yes/no）」と確認する。ユーザーが同意した場合は、ACCEPT_EULA = True に設定し、生成されたノートブックの accept_eula=ACCEPT_EULA をアンコメントする。ユーザーが拒否した場合は ACCEPT_EULA = False のままにし、同意なしではトレーニングが失敗することを警告する。
- Meta 以外の場合: 情報提供としてライセンスをユーザーに伝える。コードレベルの対応は不要 — ACCEPT_EULA 変数と accept_eula パラメータはすでにノートブックから省略されているはず（ステップ 1.3 参照）。

5. 生成後の処理

コードの生成後、実行を提案する。データセットやモデルによっては、トレーニングに数時間かかる場合があります。

ノートブックモード: run_cell が利用可能な場合は、セルの実行を提案する。利用できない場合は、ユーザー自身でセルを実行するよう伝える。

スクリプトモード: 以下の選択肢をユーザーに提示する:

「次のいずれかを選択してください:

ご自身で実行する — python scripts/[スクリプト名] で実行してください

実行して完了まで待機する

実行を開始するが待機しない — ステータスは後で確認できます」

選択肢 1: 完了。ユーザーが戻るまで待機する。
選択肢 2: スクリプトをそのまま実行する。trainer.train(wait=True) は完了までブロックする。最終ステータスを報告する。
選択肢 3: スクリプト内の wait=True を wait=False に変更し、実行してトレーニングジョブ名を報告する。

ステータス確認:

describe-training-job --training-job-name NAME → TrainingJobStatus、FailureReason、SecondaryStatusTransitions
完了後のモデルパッケージ ARN 取得: list-model-packages --model-package-group-name GROUP_NAME --sort-by CreationTime --sort-order Descending --max-results 1

完了後の結果表示:

scripts/mlflow_reference.py をパターンとして MLflow メトリクスをクエリする
エポックごとのロスをテキスト形式の表で表示する（SFT: total_loss、val_eval_total_loss／DPO: rewards/margins／RLVR: critic/rewards/mean）

重要:

トレーニングが完了する前に次のステップを提案しない
ユーザーから具体的に質問された場合を除き、次のステップについて詳しく説明しない

6. 継続的なカスタマイズ

すでにカスタマイズされたモデルをさらにファインチューニングしたい場合は、references/continuous_customization.md の手順に従う

参照ファイル

rlvr_reward_function.md — Lambda リワード関数の作成ガイド（RLVR のみ）
templates/rlvr_reward_function_source_template.py — オープンウェイトモデル向け Lambda リワード関数ソーステンプレート（RLVR のみ）
templates/nova_rlvr_reward_function_source_template.py — Nova 2.0 Lite 向け Lambda リワード関数ソーステンプレート（RLVR のみ）
code_templates/sft.py — 教師ありファインチューニング（SFT）の完全なノートブックテンプレート（OSS パス）
code_templates/dpo.py — Direct Preference Optimization（DPO）の完全なノートブックテンプレート（OSS パス）
code_templates/rlvr.py — 検証可能なリワードによる強化学習（RLVR）の完全なノートブックテンプレート（OSS パス）
references/continuous_customization.md — すでにファインチューニング済みのモデルをさらにファインチューニングする手順
rlaif_guide.md — RLAIF ファインチューニングオプションに関する手順
rlaif_builtin.py — 組み込みジャッジプロンプトを使用する RLAIF のコードテンプレート
rlaif_custom_prompt.py — カスタムジャッジプロンプトを使用する RLAIF のコードテンプレート

原文（English）を表示

Prerequisites

Before starting this workflow, verify:

A use_case_spec.md file exists
- If missing: Activate the use-case-specification skill first, then resume
- DON'T EVER offer to create a use case spec without activating the use-case-specification skill.
A fine-tuning technique (SFT, DPO, RLVR, RLAIF, or CPT/RFT (for Nova)) and base model have already been selected
- If missing: Activate the model-selection and/or finetuning-technique skills to collect what's missing, then resume
- Don't make recommendations on the spot. You MUST activate the appropriate skill.
A base model name available on SageMakerHub has been identified
- If missing: Activate the model-selection skill to get it
- Important: Only use the model name that model-selection retrieves, as it may differ from other commonly used names for the same model
The SDK environment has been verified (SDK version, region, execution role)
- If not done: Activate the sdk-getting-started skill first, then resume
A training dataset uploaded to a bucket in the environment's default region.
- If not met: Help the user upload the dataset to the correct S3

Critical Rules

Code Generation Rules

✅ Use EXACTLY the imports shown in each code template
❌ Do NOT add additional imports even if they seem helpful
❌ Do NOT create variables before they're needed in that section
📋 Copy the code structure precisely - no improvisation
🎯 Follow the minimal code principle strictly
✅ When writing code, make sure the indentation and f strings are correct

User Communication Rules

❌ NEVER offer to move on to a downstream skill while training is in progress (logically impossible)
❌ NEVER set ACCEPT_EULA to True without explicit user confirmation in the conversation
✅ Always mention both the number AND title of sections you reference
✅ If user asks how to run (notebook): If run_cell is available, offer to run it. Otherwise, tell them to run cells one by one (mention ipykernel requirement).
✅ If user asks how to run (script): Tell them to run with python3 <script>.py

Workflow

1. Code Generation Setup

1.1 Directory Setup

Identify project directory from conversation context
- If unclear (multiple relevant directories exist) → Ask user which folder to use
- If no project directory exists → activate the directory-management skill to set one up

⏸ Wait for user.

1.2 Select Code Template

Read references/code_output_guide.md for output format rules, then read the code template matching the finetuning strategy:

SFT → code_templates/sft.py
DPO → code_templates/dpo.py
RLVR → code_templates/rlvr.py
RLAIF with built-in rewards → code_templates/rlaif_builtin.py
RLAIF with custom prompt → code_templates/rlaif_custom_prompt.py

The template is a Python file where each # Cell N: Label comment marks the start of a new section. Split on these markers — everything between one marker and the next becomes one unit of output.

1.3 Generate Code

Write the code from the template following the rules in code_output_guide.md
Use same order, dependencies, and imports as the template
DO NOT improvise or add extra code
If the model is NOT a Meta/Llama model (model ID does NOT start with meta-):
- Omit the ACCEPT_EULA = False line from the config cell
- Omit the accept_eula=ACCEPT_EULA, line from the trainer call
If the model is from the Nova family, omit any code containing max_epochs or lr_warmup_steps_ratio from the Configure Trainer section and the Hyperparameter Overrides section

1.4 Auto-Generate Configuration Values

In the 'Setup & Credentials' cell, populate:

BASE_MODEL
- Use the exact SageMakerHub model name from context
MODEL_PACKAGE_GROUP_NAME
- Generate from use case (read use_case_spec.md if needed)
- Format rules:
  - Lowercase, alphanumeric with hyphens only
  - 1-63 characters
  - Pattern: [a-zA-Z0-9](-*[a-zA-Z0-9]){0,62}
  - Example: "Customer Support Chatbot" → customer-support-chatbot-v1
Save notebook

2. RLVR Reward Function (for RLVR only, skip this section if technique is SFT or DPO)

2.1 Check Reward Function Status

Ask if user has a reward function already, or would like help creating one.
- If user says they have one → Ask for the SageMaker Hub Evaluator ARN. Only proceed to Section 2.3 once the user provides a valid Evaluator ARN. If they don't have it registered as a SageMaker Hub Evaluator, continue to 2.2.
- If user says they do not have one → Continue to 2.2

2.2 Generate Reward Function From Template

Follow workflow in references/rlvr_reward_function.md section "Helping Users Create Custom Reward Functions"

2.3 Set CUSTOM_REWARD_FUNCTION value

Set the value for CUSTOM_REWARD_FUNCTION in the Notebook with the ARN of the reward function (either given directly by the user, or from the function generation code as evaluator.arn).

3. RLAIF (for RLAIF only, skip this section if technique is not RLAIF)

Read references/rlaif_guide.md and follow its instructions.

4. EULA review and acceptance

Look up the official license link for the selected base model from references/eula_links.md
Display the license to the user following the phrasing in references/eula_links.md. For OSS models: "This model is licensed under {License}. Please review the license terms here: {URL}." For Nova models: "This model is subject to the AWS Service Terms: {URL}."
Check if the selected base model is a Meta/Llama model (model ID starts with meta-)
- If Meta/Llama: Tell the user they must read and agree to the EULA before using this model. Ask: "Do you accept the license terms? (yes/no)". If the user confirms, set ACCEPT_EULA = True and uncomment accept_eula=ACCEPT_EULA in the generated notebook. If the user declines, leave ACCEPT_EULA = False and warn that training will fail without acceptance.
- If non-Meta: Inform the user of the license for their awareness. No code-level action needed — the ACCEPT_EULA variable and accept_eula parameter should already be omitted from the notebook (see Step 1.3).

5. Post-Generation

After generating the code, offer to run it. Training can take hours depending on your dataset and model.

Notebook mode: If run_cell is available, offer to run the cells. Otherwise tell the user to run cells themselves.

Script mode: Present the user with options:

"Would you like me to:

Leave it to you — run with python scripts/[script_name]

Run it and wait until it's done

Start it but don't wait — we can check status later"

Option 1: Done. Wait for user to come back.
Option 2: Execute the script as-is. trainer.train(wait=True) blocks until complete. Report final status.
Option 3: Change wait=True to wait=False in the script, execute, report the training job name.

Checking status:

describe-training-job --training-job-name NAME → TrainingJobStatus, FailureReason, SecondaryStatusTransitions
For model package ARN after completion: list-model-packages --model-package-group-name GROUP_NAME --sort-by CreationTime --sort-order Descending --max-results 1

Showing results after completion:

Use scripts/mlflow_reference.py as the pattern to query MLflow metrics
Present loss by epoch as a text table (total_loss, val_eval_total_loss for SFT; rewards/margins for DPO; critic/rewards/mean for RLVR)

CRITICAL:

DON'T suggest moving to next steps before training completes
DON'T elaborate on the next steps unless the user specifically asks you about them.

6. Continuous Customization

If the user wants to finetune a model they had already customized, follow the instructions in references/continuous_customization.md

References

rlvr_reward_function.md - Lambda reward function creation guide (RLVR only)
templates/rlvr_reward_function_source_template.py - Lambda reward function source template for open-weights models (RLVR only)
templates/nova_rlvr_reward_function_source_template.py - Lambda reward function source template for Nova 2.0 Lite (RLVR only)
code_templates/sft.py - Complete notebook template for Supervised Fine-Tuning (OSS path)
code_templates/dpo.py - Complete notebook template for Direct Preference Optimization (OSS path)
code_templates/rlvr.py - Complete notebook template for Reinforcement Learning from Verifiable Rewards (OSS path)
references/continuous_customization.md - Instructions on fine-tuning an already fine-tuned model.
rlaif_guide.md - instructions on RLAIF finetuning options
rlaif_builtin.py - Code template for RLAIF with built-in judge prompt
rlaif_custom_prompt.py - Code template for RLAIF with custom judge prompt

原文・著作権は Anthropic および各プラグイン作者に帰属します。日本語訳は Claude API による自動翻訳です。