スキルKnowledge Work

📊data-context-extractor

プラグイン: Data
ソース: GitHub で見る ↗

説明

アナリストから暗黙知を抽出することで、企業固有のデータ分析スキルを生成または改善します。 **ブートストラップモード** — トリガー: 「データコンテキストスキルを作成して」「ウェアハウス用のデータ分析をセットアップして」「データベース用のスキル作成を手伝って」「[会社名]向けのデータスキルを生成して」 → スキーマを検出し、重要な質問を行い、参照ファイルを含む初期スキルを生成します。 **イテレーションモード** — トリガー: 「[ドメイン]に関するコンテキストを追加して」「スキルに[トピック]の情報を追加して」「[指標/テーブル/用語]でデータスキルを更新して」「[ドメイン]のリファレンスを改善して」 → 既存のスキルを読み込み、的を絞った質問を行い、参照ファイルに追記または更新を行います。次のような場合に使用: データアナリストが、自社固有のデータウェアハウス・用語・指標の定義・よく使われるクエリパターンをClaudeに理解させたいとき。

原文を表示

Generate or improve a company-specific data analysis skill by extracting tribal knowledge from analysts. BOOTSTRAP MODE - Triggers: "Create a data context skill", "Set up data analysis for our warehouse", "Help me create a skill for our database", "Generate a data skill for [company]" → Discovers schemas, asks key questions, generates initial skill with reference files ITERATION MODE - Triggers: "Add context about [domain]", "The skill needs more info about [topic]", "Update the data skill with [metrics/tables/terminology]", "Improve the [domain] reference" → Loads existing skill, asks targeted questions, appends/updates reference files Use when data analysts want Claude to understand their company's specific data warehouse, terminology, metrics definitions, and common query patterns.

ユースケース

✓企業固有のデータ分析スキルを新規作成したいとき
✓既存のデータスキルに新しい情報を追加・更新するとき
✓データウェアハウスの構造をClaudeに学習させるとき
✓会社独自の用語や指標の定義を共有するとき

本文（日本語訳）

データコンテキスト抽出ツール

アナリストから企業固有のデータ知識を引き出し、カスタマイズされたデータ分析スキルを生成するメタスキルです。

動作の仕組み

このスキルには2つのモードがあります:

ブートストラップモード: 新しいデータ分析スキルをゼロから作成する
イテレーションモード: ドメイン固有のリファレンスファイルを追加して既存のスキルを改善する

ブートストラップモード

次のような場合に使用: ユーザーが自社のデータウェアハウス向けに新しいデータコンテキストスキルを作成したい場合。

フェーズ1: データベース接続と探索

ステップ1: データベースの種類を特定する

質問: 「どのデータウェアハウスをお使いですか？」

一般的な選択肢:

BigQuery
Snowflake
PostgreSQL / Redshift
Databricks

~~data warehouse ツール（クエリおよびスキーマ）を使用して接続します。不明な場合は、現在のセッションで利用可能なMCPツールを確認してください。

ステップ2: スキーマを探索する

~~data warehouse スキーマツールを使用して以下を行います:

利用可能なデータセット / スキーマを一覧表示する
最も重要なテーブルを特定する（ユーザーへの質問: 「アナリストが最も頻繁にクエリするテーブルはどれですか？3〜5つ教えてください。」）
それらのキーテーブルのスキーマ詳細を取得する

方言別のサンプル探索クエリ:

-- BigQuery: データセットの一覧表示
SELECT schema_name FROM INFORMATION_SCHEMA.SCHEMATA

-- BigQuery: データセット内のテーブル一覧表示
SELECT table_name FROM `project.dataset.INFORMATION_SCHEMA.TABLES`

-- Snowflake: スキーマの一覧表示
SHOW SCHEMAS IN DATABASE my_database

-- Snowflake: テーブルの一覧表示
SHOW TABLES IN SCHEMA my_schema

フェーズ2: コアとなる質問（必ず確認すること）

スキーマの探索後、以下の質問を会話形式で行います（一度にすべて聞かないこと）。

エンティティの明確化（重要）

「こちらで『ユーザー』や『顧客』という言葉を使う場合、正確には何を指しますか？種類はありますか？」

聞き取るべき点:

複数のエンティティ種別（user / account / organization など）
それらの関係性（1対1、1対多、多対多）
それらを結びつけるIDフィールド

主要な識別子

「[顧客/ユーザー/アカウント]の主要な識別子は何ですか？同一エンティティに複数のIDがありますか？」

聞き取るべき点:

主キーとビジネスキーの違い
UUID と整数型IDの使い分け
レガシーIDシステムの存在

主要メトリクス

「よく問い合わせのある指標を2〜3つ教えてください。それぞれどのように計算されていますか？」

聞き取るべき点:

正確な計算式（例: ARR = monthly_revenue × 12）
各メトリクスに使用するテーブル / カラム
時間軸の慣習（直近7日間、暦月など）

データの品質管理

「クエリで常に除外すべきデータはありますか？（テストデータ、不正データ、社内ユーザーなど）」

聞き取るべき点:

常に含めるべき標準的なWHERE句
除外を示すフラグカラム（is_test、is_internal、is_fraud など）
除外すべき特定の値（status = 'deleted' など）

よくある落とし穴

「新しいアナリストがこのデータで犯しがちなミスは何ですか？」

聞き取るべき点:

紛らわしいカラム名
タイムゾーンの問題
NULLの扱いに関する注意点
履歴データと現在の状態を持つテーブルの違い

フェーズ3: スキルの生成

以下の構成でスキルを作成します:

[company]-data-analyst/
├── SKILL.md
└── references/
    ├── entities.md          # エンティティの定義と関係性
    ├── metrics.md           # KPIの計算方法
    ├── tables/              # ドメインごとに1ファイル
    │   ├── [domain1].md
    │   └── [domain2].md
    └── dashboards.json      # 任意: 既存ダッシュボードのカタログ

SKILL.md テンプレート: references/skill-template.md を参照してください。

SQLダイアレクトのセクション: references/sql-dialects.md を参照し、適切な方言に関する注記を含めてください。

リファレンスファイルのテンプレート: references/domain-template.md を参照してください。

フェーズ4: パッケージ化と納品

スキルディレクトリ内にすべてのファイルを作成する
zipファイルとしてパッケージ化する
キャプチャした内容のサマリーとともにユーザーに提示する

イテレーションモード

次のような場合に使用: ユーザーが既存のスキルを持っており、さらにコンテキストを追加したい場合。

ステップ1: 既存スキルの読み込み

ユーザーに既存スキル（zip またはフォルダ）のアップロードを依頼するか、すでにセッション内にある場合はそれを参照します。

現在の SKILL.md とリファレンスファイルを読み込み、すでに何が文書化されているかを把握します。

ステップ2: 不足箇所の特定

質問: 「どのドメインやトピックにコンテキストが不足していますか？失敗しているクエリや誤った結果を返しているクエリはありますか？」

よくある不足点:

新しいデータドメイン（マーケティング、財務、プロダクトなど）
メトリクス定義の欠落
文書化されていないテーブルのリレーションシップ
新しい用語

ステップ3: ターゲットを絞った探索

特定されたドメインに対して:

関連テーブルの探索: ~~data warehouse スキーマツールを使用して該当ドメインのテーブルを検索する
ドメイン固有の質問を行う:
- 「[ドメイン] の分析にはどのテーブルを使用しますか？」
- 「[ドメイン] の主要なメトリクスは何ですか？」
- 「[ドメイン] のデータに特別なフィルタや注意すべき点はありますか？」
新しいリファレンスファイルを生成する: ドメインテンプレートを使用して references/[domain].md を作成する

ステップ4: 更新と再パッケージ化

新しいリファレンスファイルを追加する
SKILL.md の「ナレッジベースのナビゲーション」セクションを更新し、新しいドメインを追加する
スキルを再パッケージ化する
更新されたスキルをユーザーに提示する

リファレンスファイルの標準

各リファレンスファイルには以下を含めてください。

テーブルのドキュメント化

場所: テーブルのフルパス
説明: このテーブルの内容と使用場面
主キー: 行を一意に識別する方法
更新頻度: データがどのくらいの頻度でリフレッシュされるか
主要カラム: カラム名・型・説明・備考を含むテーブル形式
リレーションシップ: 他のテーブルとの結合方法
サンプルクエリ: 一般的なクエリパターン 2〜3件

メトリクスのドキュメント化

メトリクス名: 人間が読みやすい名称
定義: わかりやすい日本語での説明
計算式: カラム参照を含む正確な計算方法
ソーステーブル: データの取得元
注意事項: エッジケース、除外条件、落とし穴

エンティティのドキュメント化

エンティティ名: 呼称
定義: ビジネス上での意味
主要テーブル: このエンティティが存在するテーブル
IDフィールド: 識別方法
リレーションシップ: 他のエンティティとの関係
一般的なフィルタ: 標準的な除外条件（社内データ、テストデータなど）

品質チェックリスト

生成されたスキルを納品する前に、以下を確認してください:

[ ] SKILL.md に完全なフロントマター（名前、説明）が含まれている
[ ] エンティティの明確化セクションがわかりやすく記載されている
[ ] 主要な用語が定義されている
[ ] 標準的なフィルタ / 除外条件が文書化されている
[ ] ドメインごとにサンプルクエリが少なくとも 2〜3 件ある
[ ] SQLが正しいダイアレクト構文を使用している
[ ] リファレンスファイルが SKILL.md のナビゲーションセクションからリンクされている

原文（English）を表示

Data Context Extractor

A meta-skill that extracts company-specific data knowledge from analysts and generates tailored data analysis skills.

How It Works

This skill has two modes:

Bootstrap Mode: Create a new data analysis skill from scratch
Iteration Mode: Improve an existing skill by adding domain-specific reference files

Bootstrap Mode

Use when: User wants to create a new data context skill for their warehouse.

Phase 1: Database Connection & Discovery

Step 1: Identify the database type

Ask: "What data warehouse are you using?"

Common options:

BigQuery
Snowflake
PostgreSQL/Redshift
Databricks

Use ~~data warehouse tools (query and schema) to connect. If unclear, check available MCP tools in the current session.

Step 2: Explore the schema

Use ~~data warehouse schema tools to:

List available datasets/schemas
Identify the most important tables (ask user: "Which 3-5 tables do analysts query most often?")
Pull schema details for those key tables

Sample exploration queries by dialect:

-- BigQuery: List datasets
SELECT schema_name FROM INFORMATION_SCHEMA.SCHEMATA

-- BigQuery: List tables in a dataset
SELECT table_name FROM `project.dataset.INFORMATION_SCHEMA.TABLES`

-- Snowflake: List schemas
SHOW SCHEMAS IN DATABASE my_database

-- Snowflake: List tables
SHOW TABLES IN SCHEMA my_schema

Phase 2: Core Questions (Ask These)

After schema discovery, ask these questions conversationally (not all at once):

Entity Disambiguation (Critical)

"When people here say 'user' or 'customer', what exactly do they mean? Are there different types?"

Listen for:

Multiple entity types (user vs account vs organization)
Relationships between them (1:1, 1:many, many:many)
Which ID fields link them together

Primary Identifiers

"What's the main identifier for a [customer/user/account]? Are there multiple IDs for the same entity?"

Listen for:

Primary keys vs business keys
UUID vs integer IDs
Legacy ID systems

Key Metrics

"What are the 2-3 metrics people ask about most? How is each one calculated?"

Listen for:

Exact formulas (ARR = monthly_revenue × 12)
Which tables/columns feed each metric
Time period conventions (trailing 7 days, calendar month, etc.)

Data Hygiene

"What should ALWAYS be filtered out of queries? (test data, fraud, internal users, etc.)"

Listen for:

Standard WHERE clauses to always include
Flag columns that indicate exclusions (is_test, is_internal, is_fraud)
Specific values to exclude (status = 'deleted')

Common Gotchas

"What mistakes do new analysts typically make with this data?"

Listen for:

Confusing column names
Timezone issues
NULL handling quirks
Historical vs current state tables

Phase 3: Generate the Skill

Create a skill with this structure:

[company]-data-analyst/
├── SKILL.md
└── references/
    ├── entities.md          # Entity definitions and relationships
    ├── metrics.md           # KPI calculations
    ├── tables/              # One file per domain
    │   ├── [domain1].md
    │   └── [domain2].md
    └── dashboards.json      # Optional: existing dashboards catalog

SKILL.md Template: See references/skill-template.md

SQL Dialect Section: See references/sql-dialects.md and include the appropriate dialect notes.

Reference File Template: See references/domain-template.md

Phase 4: Package and Deliver

Create all files in the skill directory
Package as a zip file
Present to user with summary of what was captured

Iteration Mode

Use when: User has an existing skill but needs to add more context.

Step 1: Load Existing Skill

Ask user to upload their existing skill (zip or folder), or locate it if already in the session.

Read the current SKILL.md and reference files to understand what's already documented.

Step 2: Identify the Gap

Ask: "What domain or topic needs more context? What queries are failing or producing wrong results?"

Common gaps:

A new data domain (marketing, finance, product, etc.)
Missing metric definitions
Undocumented table relationships
New terminology

Step 3: Targeted Discovery

For the identified domain:

Explore relevant tables: Use ~~data warehouse schema tools to find tables in that domain
Ask domain-specific questions:
- "What tables are used for [domain] analysis?"
- "What are the key metrics for [domain]?"
- "Any special filters or gotchas for [domain] data?"
Generate new reference file: Create references/[domain].md using the domain template

Step 4: Update and Repackage

Add the new reference file
Update SKILL.md's "Knowledge Base Navigation" section to include the new domain
Repackage the skill
Present the updated skill to user

Reference File Standards

Each reference file should include:

For Table Documentation

Location: Full table path
Description: What this table contains, when to use it
Primary Key: How to uniquely identify rows
Update Frequency: How often data refreshes
Key Columns: Table with column name, type, description, notes
Relationships: How this table joins to others
Sample Queries: 2-3 common query patterns

For Metrics Documentation

Metric Name: Human-readable name
Definition: Plain English explanation
Formula: Exact calculation with column references
Source Table(s): Where the data comes from
Caveats: Edge cases, exclusions, gotchas

For Entity Documentation

Entity Name: What it's called
Definition: What it represents in the business
Primary Table: Where to find this entity
ID Field(s): How to identify it
Relationships: How it relates to other entities
Common Filters: Standard exclusions (internal, test, etc.)

Quality Checklist

Before delivering a generated skill, verify:

[ ] SKILL.md has complete frontmatter (name, description)
[ ] Entity disambiguation section is clear
[ ] Key terminology is defined
[ ] Standard filters/exclusions are documented
[ ] At least 2-3 sample queries per domain
[ ] SQL uses correct dialect syntax
[ ] Reference files are linked from SKILL.md navigation section

原文・著作権は Anthropic および各プラグイン作者に帰属します。日本語訳は Claude API による自動翻訳です。