スキルKnowledge Work

🧪instrument-data-to-allotrope

プラグイン: Bio Research
ソース: GitHub で見る ↗

説明

実験室機器の出力ファイル（PDF、CSV、Excel、TXT）を、Allotrope Simple Model（ASM）JSON形式またはフラット化した2D CSVに変換します。次のような場合に使用: 科学者がLIMSシステム、データレイク、または後続の分析向けに機器データを標準化する必要がある場合。機器タイプの自動検出に対応しており、以下の出力形式をサポートします: - 完全なASM JSON - 容易にインポート可能なフラット化CSV - データエンジニア向けのエクスポート可能なPythonコード主なユースケースとしては、機器ファイルの変換、ラボデータの標準化、LIMS／ELNシステムへのアップロード用データ準備、または本番パイプライン向けのパーサーコード生成などが挙げられます。

原文を表示

Convert laboratory instrument output files (PDF, CSV, Excel, TXT) to Allotrope Simple Model (ASM) JSON format or flattened 2D CSV. Use this skill when scientists need to standardize instrument data for LIMS systems, data lakes, or downstream analysis. Supports auto-detection of instrument types. Outputs include full ASM JSON, flattened CSV for easy import, and exportable Python code for data engineers. Common triggers include converting instrument files, standardizing lab data, preparing data for upload to LIMS/ELN systems, or generating parser code for production pipelines.

ユースケース

✓機器ファイルを標準形式に変換する
✓ラボデータをLIMSシステムに標準化する
✓後続の分析向けにデータを整形する
✓パイプライン用のパーサーコードを生成する

本文（日本語訳）

測定機器データ → Allotropeコンバーター

測定機器のファイルを標準化されたAllotrope Simple Model（ASM）形式に変換し、LIMSへのアップロード、データレイク、またはデータエンジニアリングチームへの引き渡しに活用できます。

注意: これはサンプルSkillです

このSkillは、スキーマ変換の自動化・機器出力のパース・本番対応コードの生成など、データエンジニアリング業務をSkillがどのように支援できるかを示すデモンストレーションです。

組織に合わせてカスタマイズするには:

references/ 内のファイルを編集し、自社固有のスキーマやオントロジーマッピングを追加する

MCPサーバーを使用して、スキーマを定義するシステム（LIMS、データカタログ、スキーマレジストリ等）に接続する

scripts/ を拡張して、独自の機器フォーマットや社内データ標準に対応させる

このパターンは、フォーマット間の変換や組織標準への検証が必要なあらゆるデータ変換ワークフローに応用できます。

ワークフロー概要

機器タイプの検出 ── ファイル内容から自動検出、またはユーザー指定
ファイルのパース ── allotropyライブラリ（ネイティブ）またはフレキシブルフォールバックパーサーを使用
出力の生成:
- ASM JSON（完全なセマンティック構造）
- フラット化CSV（2次元テーブル形式）
- Pythonパーサーコード（データエンジニアへの引き渡し用）
納品 ── サマリーと使用方法の説明を添えてファイルを提供

不明な点がある場合: フィールドのASMへのマッピング方法が不明な場合（例：これは生データか算出値か？デバイス設定か環境条件か？）は、ユーザーに確認してください。references/field_classification_guide.md を参照しつつ、曖昧さが残る場合は推測せずユーザーに確認してください。

クイックスタート

# 事前に依存パッケージをインストール
pip install allotropy pandas openpyxl pdfplumber --break-system-packages

# コア変換処理
from allotropy.parser_factory import Vendor
from allotropy.to_allotrope import allotrope_from_file

# allotropyで変換
asm = allotrope_from_file("instrument_data.csv", Vendor.BECKMAN_VI_CELL_BLU)

出力フォーマットの選択

ASM JSON（デフォルト） ── オントロジーURIを含む完全なセマンティック構造

最適な用途: ASMを想定したLIMSシステム、データレイク、長期アーカイブ
Allotropeスキーマに対してバリデーション可能

フラット化CSV ── 2次元テーブル形式

最適な用途: クイック分析、Excelユーザー、JSONに非対応のシステム
各測定値が1行となり、メタデータが繰り返し付与される

両方 ── 最大限の柔軟性のために両フォーマットを生成

算出データの取り扱い

重要: 生の測定値と算出値・導出値は必ず分離してください。

生データ → measurement-document（機器による直接読み取り値）
算出データ → calculated-data-aggregate-document（導出値）

算出値には、data-source-aggregate-document によるトレーサビリティの記録が必須です：

"calculated-data-aggregate-document": {
  "calculated-data-document": [{
    "calculated-data-identifier": "SAMPLE_B1_DIN_001",
    "calculated-data-name": "DNA integrity number",
    "calculated-result": {"value": 9.5, "unit": "(unitless)"},
    "data-source-aggregate-document": {
      "data-source-document": [{
        "data-source-identifier": "SAMPLE_B1_MEASUREMENT",
        "data-source-feature": "electrophoresis trace"
      }]
    }
  }]
}

機器タイプ別の主な算出フィールド:

機器	算出フィールド
セルカウンター	生存率(%)、希釈補正後の細胞密度
分光光度計	吸光度からの濃度、260/280比
プレートリーダー	検量線からの濃度、%CV
電気泳動	DIN/RIN、領域別濃度、平均サイズ
qPCR	相対定量値、Fold Change

生データと算出値の分類の詳細については references/field_classification_guide.md を参照してください。

バリデーション

ASM出力は必ずバリデーションを実施してからユーザーに提供してください：

python scripts/validate_asm.py output.json
python scripts/validate_asm.py output.json --reference known_good.json  # 参照ファイルと比較
python scripts/validate_asm.py output.json --strict  # 警告をエラーとして扱う

バリデーションルール:

Allotrope ASM仕様（2024年12月版）に基づく
最終更新: 2026-01-07
ソース: https://gitlab.com/allotrope-public/asm

ソフトバリデーションの方針: 未知のテクニック・単位・サンプルロールはエラーではなく警告として扱い、前方互換性を確保しています。2024年12月以降にAllotropeが新しい値を追加した場合、バリデーターはブロックせずに手動確認フラグを立てます。警告をエラーとして扱いたい場合は --strict モードを使用してください。

チェック項目:

テクニックの選択が正しいか（例：マルチアナライトプロファイリング vs プレートリーダー）
フィールド命名規則（スペース区切り、ハイフン不使用）
算出データにトレーサビリティがあるか（data-source-aggregate-document）
測定値・算出値の一意な識別子が存在するか
必須メタデータが含まれているか
単位・サンプルロールが有効か（未知の値はソフトバリデーション）

対応機器

完全なリストは references/supported_instruments.md を参照してください。主な対応機器：

カテゴリ	機器
セルカウンティング	Vi-CELL BLU, Vi-CELL XR, NucleoCounter
分光光度法	NanoDrop One/Eight/8000, Lunatic
プレートリーダー	SoftMax Pro, EnVision, Gen5, CLARIOstar
ELISA	SoftMax Pro, BMG MARS, MSD Workbench
qPCR	QuantStudio, Bio-Rad CFX
クロマトグラフィー	Empower, Chromeleon

検出・パース戦略

Tier 1: allotropyネイティブパース（推奨）

常にallotropyを最初に試みてください。 利用可能なVendorを直接確認できます：

from allotropy.parser_factory import Vendor

# 対応Vendorの一覧表示
for v in Vendor:
    print(f"{v.name}")

# 主なVendor例:
# AGILENT_TAPESTATION_ANALYSIS  (TapeStation XML用)
# BECKMAN_VI_CELL_BLU
# THERMO_FISHER_NANODROP_EIGHT
# MOLDEV_SOFTMAX_PRO
# APPBIO_QUANTSTUDIO
# ... その他多数

ユーザーからファイルを受け取ったら、手動パースにフォールバックする前に、必ずallotropyが対応しているか確認してください。 scripts/convert_to_asm.py の自動検出はallotropyの対応Vendorの一部のみをカバーしています。

Tier 2: フレキシブルフォールバックパース

allotropyが対応していない機器にのみ使用してください。 このフォールバックは：

calculated-data-aggregate-document を生成しません
完全なトレーサビリティを含みません
簡略化されたASM構造を出力します

フレキシブルパーサーの機能：

カラム名のファジーマッチング
ヘッダーからの単位抽出
ファイル構造からのメタデータ抽出

Tier 3: PDF抽出

PDFファイルのみの場合は、pdfplumberでテーブルを抽出した後にTier 2のパースを適用します。

パース前チェックリスト

カスタムパーサーを作成する前に、必ず以下を実施してください：

allotropyが対応しているか確認 ── 対応している場合はネイティブパーサーを使用
参照ASMファイルを探す ── references/examples/ を確認するか、ユーザーに提供を依頼
機器固有のガイドを確認 ── references/instrument_guides/ を参照
参照ファイルと照合してバリデーション ── validate_asm.py --reference <ファイル> を実行

よくある間違いと対処法

間違い	正しいアプローチ
manifestをオブジェクトにしている	URL文字列を使用する
検出タイプが小文字	"absorbance"ではなく"Absorbance"を使用する
"emission wavelength setting"	発光には"detector wavelength setting"を使用する
全測定値を1つのドキュメントにまとめている	ウェル/サンプル位置ごとにグループ化する
手順メタデータが欠落している	測定ごとに全デバイス設定を抽出する

データエンジニア向けコードエクスポート

研究者がデータエンジニアに引き渡せる、スタンドアロンのPythonスクリプトを生成します：

# パーサーコードをエクスポート
python scripts/export_parser.py --input "data.csv" --vendor "VI_CELL_BLU" --output "parser_script.py"

エクスポートされたスクリプトの特徴：

pandas/allotropy以外の外部依存なし
インラインドキュメント付き
Jupyter Notebookで実行可能
データパイプラインへの本番適用が可能

ファイル構成

instrument-data-to-allotrope/
├── SKILL.md                          # 本ファイル
├── scripts/
│   ├── convert_to_asm.py            # メイン変換スクリプト
│   ├── flatten_asm.py               # ASM → 2次元CSV変換
│   ├── export_parser.py             # スタンドアロンパーサーコード生成
│   └── validate_asm.py              # ASM出力品質バリデーション
└── references/
    ├── supported_instruments.md     # Vendor enum付き完全機器リスト
    ├── asm_schema_overview.md       # ASM構造リファレンス
    ├── field_classification_guide.md # フィールドタイプの配置ガイド
    └── flattening_guide.md          # フラット化処理の説明

使用例

例1: Vi-CELL BLUファイルの変換

ユーザー: 「このセルカウントデータをAllotrope形式に変換してください」
[viCell_Results.xlsx をアップロード]

Claude:
1. Vi-CELL BLUを検出（信頼度95%）
2. allotropyネイティブパーサーで変換
3. 出力:
   - viCell_Results_asm.json（完全ASM）
   - viCell_Results_flat.csv（2次元形式）
   - viCell_parser.py（エクスポート可能なコード）

例2: コード引き渡しのリクエスト

ユーザー: 「NanoDropファイルをパースするコードをデータエンジニアに渡したい」

Claude:
1. 自己完結型Pythonスクリプトを生成
2. サンプルの入出力を含める
3. 全ての前提条件をドキュメント化
4. Jupyter Notebookバージョンを提供

例3: LIMSアップロード用フラット化出力

ユーザー: 「このELISAデータをLIMSにアップロードできるCSVに変換してください」

Claude:
1. プレートリーダーデータをパース
2. 以下のカラムを含むフラット化CSVを生成:
   - sample_identifier, well_position, measurement_value, measurement_unit
   - instrument_serial_number, analysis_datetime, assay_type
3. 一般的なLIMSインポート要件に対してバリデーション

実装上の注意事項

allotropyのインストール

pip install allotropy --break-system-packages

パース失敗時の対処

allotropyネイティブパースが失敗した場合： 1

原文（English）を表示

Instrument Data to Allotrope Converter

Convert instrument files into standardized Allotrope Simple Model (ASM) format for LIMS upload, data lakes, or handoff to data engineering teams.

Note: This is an Example Skill

This skill demonstrates how skills can support your data engineering tasks—automating schema transformations, parsing instrument outputs, and generating production-ready code.

To customize for your organization:

Modify the references/ files to include your company's specific schemas or ontology mappings

Use an MCP server to connect to systems that define your schemas (e.g., your LIMS, data catalog, or schema registry)

Extend the scripts/ to handle proprietary instrument formats or internal data standards

This pattern can be adapted for any data transformation workflow where you need to convert between formats or validate against organizational standards.

Workflow Overview

Detect instrument type from file contents (auto-detect or user-specified)
Parse file using allotropy library (native) or flexible fallback parser
Generate outputs:
- ASM JSON (full semantic structure)
- Flattened CSV (2D tabular format)
- Python parser code (for data engineer handoff)
Deliver files with summary and usage instructions

When Uncertain: If you're unsure how to map a field to ASM (e.g., is this raw data or calculated? device setting or environmental condition?), ask the user for clarification. Refer to references/field_classification_guide.md for guidance, but when ambiguity remains, confirm with the user rather than guessing.

Quick Start

# Install requirements first
pip install allotropy pandas openpyxl pdfplumber --break-system-packages

# Core conversion
from allotropy.parser_factory import Vendor
from allotropy.to_allotrope import allotrope_from_file

# Convert with allotropy
asm = allotrope_from_file("instrument_data.csv", Vendor.BECKMAN_VI_CELL_BLU)

Output Format Selection

ASM JSON (default) - Full semantic structure with ontology URIs

Best for: LIMS systems expecting ASM, data lakes, long-term archival
Validates against Allotrope schemas

Flattened CSV - 2D tabular representation

Best for: Quick analysis, Excel users, systems without JSON support
Each measurement becomes one row with metadata repeated

Both - Generate both formats for maximum flexibility

Calculated Data Handling

IMPORTANT: Separate raw measurements from calculated/derived values.

Raw data → measurement-document (direct instrument readings)
Calculated data → calculated-data-aggregate-document (derived values)

Calculated values MUST include traceability via data-source-aggregate-document:

"calculated-data-aggregate-document": {
  "calculated-data-document": [{
    "calculated-data-identifier": "SAMPLE_B1_DIN_001",
    "calculated-data-name": "DNA integrity number",
    "calculated-result": {"value": 9.5, "unit": "(unitless)"},
    "data-source-aggregate-document": {
      "data-source-document": [{
        "data-source-identifier": "SAMPLE_B1_MEASUREMENT",
        "data-source-feature": "electrophoresis trace"
      }]
    }
  }]
}

Common calculated fields by instrument type:

Instrument	Calculated Fields
Cell counter	Viability %, cell density dilution-adjusted values
Spectrophotometer	Concentration (from absorbance), 260/280 ratio
Plate reader	Concentrations from standard curve, %CV
Electrophoresis	DIN/RIN, region concentrations, average sizes
qPCR	Relative quantities, fold change

See references/field_classification_guide.md for detailed guidance on raw vs. calculated classification.

Validation

Always validate ASM output before delivering to the user:

python scripts/validate_asm.py output.json
python scripts/validate_asm.py output.json --reference known_good.json  # Compare to reference
python scripts/validate_asm.py output.json --strict  # Treat warnings as errors

Validation Rules:

Based on Allotrope ASM specification (December 2024)
Last updated: 2026-01-07
Source: https://gitlab.com/allotrope-public/asm

Soft Validation Approach: Unknown techniques, units, or sample roles generate warnings (not errors) to allow for forward compatibility. If Allotrope adds new values after December 2024, the validator won't block them—it will flag them for manual verification. Use --strict mode to treat warnings as errors if you need stricter validation.

What it checks:

Correct technique selection (e.g., multi-analyte profiling vs plate reader)
Field naming conventions (space-separated, not hyphenated)
Calculated data has traceability (data-source-aggregate-document)
Unique identifiers exist for measurements and calculated values
Required metadata present
Valid units and sample roles (with soft validation for unknown values)

Supported Instruments

See references/supported_instruments.md for complete list. Key instruments:

Category	Instruments
Cell Counting	Vi-CELL BLU, Vi-CELL XR, NucleoCounter
Spectrophotometry	NanoDrop One/Eight/8000, Lunatic
Plate Readers	SoftMax Pro, EnVision, Gen5, CLARIOstar
ELISA	SoftMax Pro, BMG MARS, MSD Workbench
qPCR	QuantStudio, Bio-Rad CFX
Chromatography	Empower, Chromeleon

Detection & Parsing Strategy

Tier 1: Native allotropy parsing (PREFERRED)

Always try allotropy first. Check available vendors directly:

from allotropy.parser_factory import Vendor

# List all supported vendors
for v in Vendor:
    print(f"{v.name}")

# Common vendors:
# AGILENT_TAPESTATION_ANALYSIS  (for TapeStation XML)
# BECKMAN_VI_CELL_BLU
# THERMO_FISHER_NANODROP_EIGHT
# MOLDEV_SOFTMAX_PRO
# APPBIO_QUANTSTUDIO
# ... many more

When the user provides a file, check if allotropy supports it before falling back to manual parsing. The scripts/convert_to_asm.py auto-detection only covers a subset of allotropy vendors.

Tier 2: Flexible fallback parsing

Only use if allotropy doesn't support the instrument. This fallback:

Does NOT generate calculated-data-aggregate-document
Does NOT include full traceability
Produces simplified ASM structure

Use flexible parser with:

Column name fuzzy matching
Unit extraction from headers
Metadata extraction from file structure

Tier 3: PDF extraction

For PDF-only files, extract tables using pdfplumber, then apply Tier 2 parsing.

Pre-Parsing Checklist

Before writing a custom parser, ALWAYS:

Check if allotropy supports it - Use native parser if available
Find a reference ASM file - Check references/examples/ or ask user
Review instrument-specific guide - Check references/instrument_guides/
Validate against reference - Run validate_asm.py --reference <file>

Common Mistakes to Avoid

Mistake	Correct Approach
Manifest as object	Use URL string
Lowercase detection types	Use "Absorbance" not "absorbance"
"emission wavelength setting"	Use "detector wavelength setting" for emission
All measurements in one document	Group by well/sample location
Missing procedure metadata	Extract ALL device settings per measurement

Code Export for Data Engineers

Generate standalone Python scripts that scientists can hand off:

# Export parser code
python scripts/export_parser.py --input "data.csv" --vendor "VI_CELL_BLU" --output "parser_script.py"

The exported script:

Has no external dependencies beyond pandas/allotropy
Includes inline documentation
Can run in Jupyter notebooks
Is production-ready for data pipelines

File Structure

instrument-data-to-allotrope/
├── SKILL.md                          # This file
├── scripts/
│   ├── convert_to_asm.py            # Main conversion script
│   ├── flatten_asm.py               # ASM → 2D CSV conversion
│   ├── export_parser.py             # Generate standalone parser code
│   └── validate_asm.py              # Validate ASM output quality
└── references/
    ├── supported_instruments.md     # Full instrument list with Vendor enums
    ├── asm_schema_overview.md       # ASM structure reference
    ├── field_classification_guide.md # Where to put different field types
    └── flattening_guide.md          # How flattening works

Usage Examples

Example 1: Vi-CELL BLU file

User: "Convert this cell counting data to Allotrope format"
[uploads viCell_Results.xlsx]

Claude:
1. Detects Vi-CELL BLU (95% confidence)
2. Converts using allotropy native parser
3. Outputs:
   - viCell_Results_asm.json (full ASM)
   - viCell_Results_flat.csv (2D format)
   - viCell_parser.py (exportable code)

Example 2: Request for code handoff

User: "I need to give our data engineer code to parse NanoDrop files"

Claude:
1. Generates self-contained Python script
2. Includes sample input/output
3. Documents all assumptions
4. Provides Jupyter notebook version

Example 3: LIMS-ready flattened output

User: "Convert this ELISA data to a CSV I can upload to our LIMS"

Claude:
1. Parses plate reader data
2. Generates flattened CSV with columns:
   - sample_identifier, well_position, measurement_value, measurement_unit
   - instrument_serial_number, analysis_datetime, assay_type
3. Validates against common LIMS import requirements

Implementation Notes

Installing allotropy

pip install allotropy --break-system-packages

Handling parse failures

If allotropy native parsing fails:

Log the error for debugging
Fall back to flexible parser
Report reduced metadata completeness to user
Suggest exporting different format from instrument

ASM Schema Validation

Validate output against Allotrope schemas when available:

import jsonschema
# Schema URLs in references/asm_schema_overview.md

原文・著作権は Anthropic および各プラグイン作者に帰属します。日本語訳は Claude API による自動翻訳です。