🗄️databricks-dbsql
- プラグイン
- databricks
- ソース
- GitHub で見る ↗
説明
Databricks SQL(DBSQL)の高度な機能およびSQLウェアハウスの機能に関するSkill。 ユーザーが以下のいずれかに言及した場合、このSkillを**必ず**呼び出すこと: 「DBSQL」「Databricks SQL」「SQL warehouse」「SQLスクリプティング」「ストアドプロシージャ」「CALL procedure」「マテリアライズドビュー」「CREATE MATERIALIZED VIEW」「パイプ構文」「`|>`」「ジオスペーシャル」「H3」「`ST_`」「spatial SQL」「コレーション」「COLLATE」「`ai_query`」「`ai_classify`」「`ai_extract`」「`ai_gen`」「AI関数」「`http_request`」「`remote_query`」「`read_files`」「Lakehouse Federation」「再帰CTE」「WITH RECURSIVE」「マルチステートメントトランザクション」「一時テーブル」「テンポラリービュー」「パイプ演算子」 次のような場合にも使用: ユーザーがDatabricks上でのSQLのベストプラクティス、データモデリングパターン、または高度なSQL機能について質問している場合。
原文を表示
Databricks SQL (DBSQL) advanced features and SQL warehouse capabilities. This skill MUST be invoked when the user mentions: "DBSQL", "Databricks SQL", "SQL warehouse", "SQL scripting", "stored procedure", "CALL procedure", "materialized view", "CREATE MATERIALIZED VIEW", "pipe syntax", "|>", "geospatial", "H3", "ST_", "spatial SQL", "collation", "COLLATE", "ai_query", "ai_classify", "ai_extract", "ai_gen", "AI function", "http_request", "remote_query", "read_files", "Lakehouse Federation", "recursive CTE", "WITH RECURSIVE", "multi-statement transaction", "temp table", "temporary view", "pipe operator". SHOULD also invoke when the user asks about SQL best practices, data modeling patterns, or advanced SQL features on Databricks.
ユースケース
- ✓Databricks SQLの高度な機能を使用するとき
- ✓SQLウェアハウスを構築・最適化するとき
- ✓SQLのベストプラクティスを確認するとき
- ✓データモデリングパターンを検討するとき
本文(日本語訳)
Databricks SQL (DBSQL) - 高度な機能
クイックリファレンス
| 機能 | 主な構文 | 対応バージョン | 参考資料 |
|---|---|---|---|
| SQL スクリプティング | BEGIN...END, DECLARE, IF/WHILE/FOR |
DBR 16.3+ | references/sql-scripting.md |
| ストアドプロシージャ | CREATE PROCEDURE, CALL |
DBR 17.0+ | references/sql-scripting.md |
| 再帰 CTE | WITH RECURSIVE |
DBR 17.0+ | references/sql-scripting.md |
| トランザクション | BEGIN ATOMIC...END |
プレビュー | references/sql-scripting.md |
| マテリアライズドビュー | CREATE MATERIALIZED VIEW |
Pro/Serverless | references/materialized-views-pipes.md |
| 一時テーブル | CREATE TEMPORARY TABLE |
全バージョン | references/materialized-views-pipes.md |
| パイプ構文 | |> 演算子 |
DBR 16.1+ | references/materialized-views-pipes.md |
| 地理空間 (H3) | h3_longlatash3(), h3_polyfillash3() |
DBR 11.2+ | references/geospatial-collations.md |
| 地理空間 (ST) | ST_Point(), ST_Contains() ほか 80+ 関数 |
DBR 16.0+ | references/geospatial-collations.md |
| コレーション | COLLATE, UTF8_LCASE, ロケール対応 |
DBR 16.1+ | references/geospatial-collations.md |
| AI 関数 | ai_query(), ai_classify() ほか 11+ 関数 |
DBR 15.1+ | references/ai-functions.md |
| http_request | http_request(conn, ...) |
Pro/Serverless | references/ai-functions.md |
| remote_query | SELECT * FROM remote_query(...) |
Pro/Serverless | references/ai-functions.md |
| read_files | SELECT * FROM read_files(...) |
全バージョン | references/ai-functions.md |
| データモデリング | スタースキーマ、Liquid Clustering | 全バージョン | references/best-practices.md |
よく使うパターン
SQL スクリプティング - 手続き型 ETL
BEGIN
DECLARE v_count INT;
DECLARE v_status STRING DEFAULT 'pending';
SET v_count = (SELECT COUNT(*) FROM catalog.schema.raw_orders WHERE status = 'new');
IF v_count > 0 THEN
INSERT INTO catalog.schema.processed_orders
SELECT *, current_timestamp() AS processed_at
FROM catalog.schema.raw_orders
WHERE status = 'new';
SET v_status = 'completed';
ELSE
SET v_status = 'skipped';
END IF;
SELECT v_status AS result, v_count AS rows_processed;
END
エラーハンドリング付きストアドプロシージャ
CREATE OR REPLACE PROCEDURE catalog.schema.upsert_customers(
IN p_source STRING,
OUT p_rows_affected INT
)
LANGUAGE SQL
SQL SECURITY INVOKER
BEGIN
DECLARE EXIT HANDLER FOR SQLEXCEPTION
BEGIN
SET p_rows_affected = -1;
SIGNAL SQLSTATE '45000'
SET MESSAGE_TEXT = concat('Upsert failed for source: ', p_source);
END;
MERGE INTO catalog.schema.dim_customer AS t
USING (SELECT * FROM identifier(p_source)) AS s
ON t.customer_id = s.customer_id
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *;
SET p_rows_affected = (SELECT COUNT(*) FROM identifier(p_source));
END;
-- 呼び出し方:
CALL catalog.schema.upsert_customers('catalog.schema.staging_customers', ?);
スケジュール更新付きマテリアライズドビュー
CREATE OR REPLACE MATERIALIZED VIEW catalog.schema.daily_revenue
CLUSTER BY (order_date)
SCHEDULE EVERY 1 HOUR
COMMENT 'Hourly-refreshed daily revenue by region'
AS SELECT
order_date,
region,
SUM(amount) AS total_revenue,
COUNT(DISTINCT customer_id) AS unique_customers
FROM catalog.schema.fact_orders
JOIN catalog.schema.dim_store USING (store_id)
GROUP BY order_date, region;
パイプ構文 - 可読性の高い変換処理
-- 従来の SQL をパイプ構文で書き直した例
FROM catalog.schema.fact_orders
|> WHERE order_date >= current_date() - INTERVAL 30 DAYS
|> AGGREGATE SUM(amount) AS total, COUNT(*) AS cnt GROUP BY region, product_category
|> WHERE total > 10000
|> ORDER BY total DESC
|> LIMIT 20;
AI 関数 - LLM によるデータエンリッチメント
-- サポートチケットの分類
SELECT
ticket_id,
description,
ai_classify(description, ARRAY('billing', 'technical', 'account', 'feature_request')) AS category,
ai_analyze_sentiment(description) AS sentiment
FROM catalog.schema.support_tickets
LIMIT 100;
-- テキストからエンティティを抽出
SELECT
doc_id,
ai_extract(content, ARRAY('person_name', 'company', 'dollar_amount')) AS entities
FROM catalog.schema.contracts;
-- 構造化出力を伴う汎用 AI クエリ
SELECT ai_query(
'databricks-meta-llama-3-3-70b-instruct',
concat('Summarize this customer feedback in JSON with keys: topic, sentiment, action_items. Feedback: ', feedback),
returnType => 'STRUCT<topic STRING, sentiment STRING, action_items ARRAY<STRING>>'
) AS analysis
FROM catalog.schema.customer_feedback
LIMIT 50;
地理空間 - H3 を使った近接検索
-- H3 インデックスを用いて各顧客から 5km 以内の店舗を検索
WITH customer_h3 AS (
SELECT *, h3_longlatash3(longitude, latitude, 7) AS h3_cell
FROM catalog.schema.customers
),
store_h3 AS (
SELECT *, h3_longlatash3(longitude, latitude, 7) AS h3_cell
FROM catalog.schema.stores
)
SELECT
c.customer_id,
s.store_id,
ST_Distance(
ST_Point(c.longitude, c.latitude),
ST_Point(s.longitude, s.latitude)
) AS distance_m
FROM customer_h3 c
JOIN store_h3 s ON h3_ischildof(c.h3_cell, h3_toparent(s.h3_cell, 5))
WHERE ST_Distance(
ST_Point(c.longitude, c.latitude),
ST_Point(s.longitude, s.latitude)
) < 5000;
コレーション - 大文字・小文字を区別しない検索
-- 大文字・小文字を区別しないコレーションでテーブルを作成
CREATE TABLE catalog.schema.products (
product_id BIGINT GENERATED ALWAYS AS IDENTITY,
name STRING COLLATE UTF8_LCASE,
category STRING COLLATE UTF8_LCASE,
price DECIMAL(10, 2)
);
-- クエリは自動的に大文字・小文字を区別しない(LOWER() 不要)
SELECT * FROM catalog.schema.products
WHERE name = 'MacBook Pro'; -- 'macbook pro', 'MACBOOK PRO' なども一致
http_request - 外部 API の呼び出し
-- 最初にコネクションを作成(初回のみ)
CREATE CONNECTION my_api_conn
TYPE HTTP
OPTIONS (host 'https://api.example.com', bearer_token secret('scope', 'token'));
-- SQL から API を呼び出す
SELECT
order_id,
http_request(
conn => 'my_api_conn',
method => 'POST',
path => '/v1/validate',
json => to_json(named_struct('order_id', order_id, 'amount', amount))
).text AS api_response
FROM catalog.schema.orders
WHERE needs_validation = true;
read_files - 生ファイルの取り込み
-- Volume 内の JSON ファイルをスキーマヒント付きで読み込む
SELECT *
FROM read_files(
'/Volumes/catalog/schema/raw/events/',
format => 'json',
schemaHints => 'event_id STRING, timestamp TIMESTAMP, payload MAP<STRING, STRING>',
pathGlobFilter => '*.json',
recursiveFileLookup => true
);
-- オプション付きで CSV を読み込む
SELECT *
FROM read_files(
'/Volumes/catalog/schema/raw/sales/',
format => 'csv',
header => true,
delimiter => '|',
dateFormat => 'yyyy-MM-dd',
schema => 'sale_id INT, sale_date DATE, amount DECIMAL(10,2), store STRING'
);
再帰 CTE - 階層構造のトラバーサル
WITH RECURSIVE org_chart AS (
-- アンカー: 最上位マネージャー
SELECT employee_id, name, manager_id, 0 AS depth, ARRAY(name) AS path
FROM catalog.schema.employees
WHERE manager_id IS NULL
UNION ALL
-- 再帰: 直属の部下
SELECT e.employee_id, e.name, e.manager_id, o.depth + 1, array_append(o.path, e.name)
FROM catalog.schema.employees e
JOIN org_chart o ON e.manager_id = o.employee_id
WHERE o.depth < 10 -- 安全上限
)
SELECT * FROM org_chart ORDER BY depth, name;
remote_query - フェデレーテッドクエリ
-- Lakehouse Federation 経由で PostgreSQL をクエリ
SELECT *
FROM remote_query(
'my_postgres_connection',
database => 'my_database',
query => 'SELECT customer_id, email, created_at FROM customers WHERE active = true'
);
リファレンスファイル
詳細な構文・完全なパラメータ一覧・応用パターンについては、以下のファイルを参照してください。
| ファイル | 内容 | 次のような場合に使用 |
|---|---|---|
| references/sql-scripting.md | SQL スクリプティング、ストアドプロシージャ、再帰 CTE、トランザクション | 手続き型 SQL、エラーハンドリング、ループ、動的 SQL が必要な場合 |
| references/materialized-views-pipes.md | マテリアライズドビュー、一時テーブル/ビュー、パイプ構文 | MV、更新スケジュール、一時オブジェクト、パイプ演算子が必要な場合 |
| references/geospatial-collations.md | H3 関数 39 個、ST 関数 80+ 個、コレーションの種類と階層 | 空間分析、H3 インデックス、大文字/アクセント処理が必要な場合 |
| references/ai-functions.md | AI 関数 13 個、http_request、remote_query、read_files(全オプション) | AI エンリッチメント、API 呼び出し、フェデレーション、ファイル取り込みが必要な場合 |
| references/best-practices.md | データモデリング、パフォーマンス、Liquid Clustering、アンチパターン | アーキテクチャ指針、最適化、またはモデリングのアドバイスが必要な場合 |
主なガイドライン
- AI 関数・MV・http_request には Serverless SQL ウェアハウスを使用すること
- AI 関数の開発中はコスト管理のために
LIMITを付けること - 新規テーブルではパーティショニングより Liquid Clustering を優先すること(クラスタリングキーは最大 1〜4 個)
- クラスタリングキーが不明な場合は
CLUSTER BY AUTOを使用すること - BI 向け Gold レイヤーではスタースキーマを採用すること(Silver レイヤーでは OBT も許容)
- ディメンショナルモデルにはクエリ最適化のために PK/FK 制約を定義すること
- 大文字・小文字を区別しない検索が必要なユーザー向け文字列カラムには
COLLATE UTF8_LCASEを使用すること - デプロイ前に CLI(
databricks experimental aitools tools query)またはノートブックで SQL をテストすること。 CLI バージョンによって--warehouseオプションが拒否される場合は、代わりに環境変数DATABRICKS_WAREHOUSE_IDを設定すること。
原文(English)を表示
Databricks SQL (DBSQL) - Advanced Features
Quick Reference
| Feature | Key Syntax | Since | Reference |
|---|---|---|---|
| SQL Scripting | BEGIN...END, DECLARE, IF/WHILE/FOR |
DBR 16.3+ | references/sql-scripting.md |
| Stored Procedures | CREATE PROCEDURE, CALL |
DBR 17.0+ | references/sql-scripting.md |
| Recursive CTEs | WITH RECURSIVE |
DBR 17.0+ | references/sql-scripting.md |
| Transactions | BEGIN ATOMIC...END |
Preview | references/sql-scripting.md |
| Materialized Views | CREATE MATERIALIZED VIEW |
Pro/Serverless | references/materialized-views-pipes.md |
| Temp Tables | CREATE TEMPORARY TABLE |
All | references/materialized-views-pipes.md |
| Pipe Syntax | |> operator |
DBR 16.1+ | references/materialized-views-pipes.md |
| Geospatial (H3) | h3_longlatash3(), h3_polyfillash3() |
DBR 11.2+ | references/geospatial-collations.md |
| Geospatial (ST) | ST_Point(), ST_Contains(), 80+ funcs |
DBR 16.0+ | references/geospatial-collations.md |
| Collations | COLLATE, UTF8_LCASE, locale-aware |
DBR 16.1+ | references/geospatial-collations.md |
| AI Functions | ai_query(), ai_classify(), 11+ funcs |
DBR 15.1+ | references/ai-functions.md |
| http_request | http_request(conn, ...) |
Pro/Serverless | references/ai-functions.md |
| remote_query | SELECT * FROM remote_query(...) |
Pro/Serverless | references/ai-functions.md |
| read_files | SELECT * FROM read_files(...) |
All | references/ai-functions.md |
| Data Modeling | Star schema, Liquid Clustering | All | references/best-practices.md |
Common Patterns
SQL Scripting - Procedural ETL
BEGIN
DECLARE v_count INT;
DECLARE v_status STRING DEFAULT 'pending';
SET v_count = (SELECT COUNT(*) FROM catalog.schema.raw_orders WHERE status = 'new');
IF v_count > 0 THEN
INSERT INTO catalog.schema.processed_orders
SELECT *, current_timestamp() AS processed_at
FROM catalog.schema.raw_orders
WHERE status = 'new';
SET v_status = 'completed';
ELSE
SET v_status = 'skipped';
END IF;
SELECT v_status AS result, v_count AS rows_processed;
END
Stored Procedure with Error Handling
CREATE OR REPLACE PROCEDURE catalog.schema.upsert_customers(
IN p_source STRING,
OUT p_rows_affected INT
)
LANGUAGE SQL
SQL SECURITY INVOKER
BEGIN
DECLARE EXIT HANDLER FOR SQLEXCEPTION
BEGIN
SET p_rows_affected = -1;
SIGNAL SQLSTATE '45000'
SET MESSAGE_TEXT = concat('Upsert failed for source: ', p_source);
END;
MERGE INTO catalog.schema.dim_customer AS t
USING (SELECT * FROM identifier(p_source)) AS s
ON t.customer_id = s.customer_id
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *;
SET p_rows_affected = (SELECT COUNT(*) FROM identifier(p_source));
END;
-- Invoke:
CALL catalog.schema.upsert_customers('catalog.schema.staging_customers', ?);
Materialized View with Scheduled Refresh
CREATE OR REPLACE MATERIALIZED VIEW catalog.schema.daily_revenue
CLUSTER BY (order_date)
SCHEDULE EVERY 1 HOUR
COMMENT 'Hourly-refreshed daily revenue by region'
AS SELECT
order_date,
region,
SUM(amount) AS total_revenue,
COUNT(DISTINCT customer_id) AS unique_customers
FROM catalog.schema.fact_orders
JOIN catalog.schema.dim_store USING (store_id)
GROUP BY order_date, region;
Pipe Syntax - Readable Transformations
-- Traditional SQL rewritten with pipe syntax
FROM catalog.schema.fact_orders
|> WHERE order_date >= current_date() - INTERVAL 30 DAYS
|> AGGREGATE SUM(amount) AS total, COUNT(*) AS cnt GROUP BY region, product_category
|> WHERE total > 10000
|> ORDER BY total DESC
|> LIMIT 20;
AI Functions - Enrich Data with LLMs
-- Classify support tickets
SELECT
ticket_id,
description,
ai_classify(description, ARRAY('billing', 'technical', 'account', 'feature_request')) AS category,
ai_analyze_sentiment(description) AS sentiment
FROM catalog.schema.support_tickets
LIMIT 100;
-- Extract entities from text
SELECT
doc_id,
ai_extract(content, ARRAY('person_name', 'company', 'dollar_amount')) AS entities
FROM catalog.schema.contracts;
-- General-purpose AI query with structured output
SELECT ai_query(
'databricks-meta-llama-3-3-70b-instruct',
concat('Summarize this customer feedback in JSON with keys: topic, sentiment, action_items. Feedback: ', feedback),
returnType => 'STRUCT<topic STRING, sentiment STRING, action_items ARRAY<STRING>>'
) AS analysis
FROM catalog.schema.customer_feedback
LIMIT 50;
Geospatial - Proximity Search with H3
-- Find stores within 5km of each customer using H3 indexing
WITH customer_h3 AS (
SELECT *, h3_longlatash3(longitude, latitude, 7) AS h3_cell
FROM catalog.schema.customers
),
store_h3 AS (
SELECT *, h3_longlatash3(longitude, latitude, 7) AS h3_cell
FROM catalog.schema.stores
)
SELECT
c.customer_id,
s.store_id,
ST_Distance(
ST_Point(c.longitude, c.latitude),
ST_Point(s.longitude, s.latitude)
) AS distance_m
FROM customer_h3 c
JOIN store_h3 s ON h3_ischildof(c.h3_cell, h3_toparent(s.h3_cell, 5))
WHERE ST_Distance(
ST_Point(c.longitude, c.latitude),
ST_Point(s.longitude, s.latitude)
) < 5000;
Collation - Case-Insensitive Search
-- Create table with case-insensitive collation
CREATE TABLE catalog.schema.products (
product_id BIGINT GENERATED ALWAYS AS IDENTITY,
name STRING COLLATE UTF8_LCASE,
category STRING COLLATE UTF8_LCASE,
price DECIMAL(10, 2)
);
-- Queries automatically case-insensitive (no LOWER() needed)
SELECT * FROM catalog.schema.products
WHERE name = 'MacBook Pro'; -- matches 'macbook pro', 'MACBOOK PRO', etc.
http_request - Call External APIs
-- Set up connection first (one-time)
CREATE CONNECTION my_api_conn
TYPE HTTP
OPTIONS (host 'https://api.example.com', bearer_token secret('scope', 'token'));
-- Call API from SQL
SELECT
order_id,
http_request(
conn => 'my_api_conn',
method => 'POST',
path => '/v1/validate',
json => to_json(named_struct('order_id', order_id, 'amount', amount))
).text AS api_response
FROM catalog.schema.orders
WHERE needs_validation = true;
read_files - Ingest Raw Files
-- Read JSON files from a Volume with schema hints
SELECT *
FROM read_files(
'/Volumes/catalog/schema/raw/events/',
format => 'json',
schemaHints => 'event_id STRING, timestamp TIMESTAMP, payload MAP<STRING, STRING>',
pathGlobFilter => '*.json',
recursiveFileLookup => true
);
-- Read CSV with options
SELECT *
FROM read_files(
'/Volumes/catalog/schema/raw/sales/',
format => 'csv',
header => true,
delimiter => '|',
dateFormat => 'yyyy-MM-dd',
schema => 'sale_id INT, sale_date DATE, amount DECIMAL(10,2), store STRING'
);
Recursive CTE - Hierarchy Traversal
WITH RECURSIVE org_chart AS (
-- Anchor: top-level managers
SELECT employee_id, name, manager_id, 0 AS depth, ARRAY(name) AS path
FROM catalog.schema.employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive: direct reports
SELECT e.employee_id, e.name, e.manager_id, o.depth + 1, array_append(o.path, e.name)
FROM catalog.schema.employees e
JOIN org_chart o ON e.manager_id = o.employee_id
WHERE o.depth < 10 -- safety limit
)
SELECT * FROM org_chart ORDER BY depth, name;
remote_query - Federated Queries
-- Query PostgreSQL via Lakehouse Federation
SELECT *
FROM remote_query(
'my_postgres_connection',
database => 'my_database',
query => 'SELECT customer_id, email, created_at FROM customers WHERE active = true'
);
Reference Files
Load these for detailed syntax, full parameter lists, and advanced patterns:
| File | Contents | When to Read |
|---|---|---|
| references/sql-scripting.md | SQL Scripting, Stored Procedures, Recursive CTEs, Transactions | User needs procedural SQL, error handling, loops, dynamic SQL |
| references/materialized-views-pipes.md | Materialized Views, Temp Tables/Views, Pipe Syntax | User needs MVs, refresh scheduling, temp objects, pipe operator |
| references/geospatial-collations.md | 39 H3 functions, 80+ ST functions, Collation types and hierarchy | User needs spatial analysis, H3 indexing, case/accent handling |
| references/ai-functions.md | 13 AI functions, http_request, remote_query, read_files (all options) | User needs AI enrichment, API calls, federation, file ingestion |
| references/best-practices.md | Data modeling, performance, Liquid Clustering, anti-patterns | User needs architecture guidance, optimization, or modeling advice |
Key Guidelines
- Always use Serverless SQL warehouses for AI functions, MVs, and http_request
- Use
LIMITduring development with AI functions to control costs - Prefer Liquid Clustering over partitioning for new tables (1-4 keys max)
- Use
CLUSTER BY AUTOwhen unsure about clustering keys - Star schema in Gold layer for BI; OBT acceptable in Silver
- Define PK/FK constraints on dimensional models for query optimization
- Use
COLLATE UTF8_LCASEfor user-facing string columns that need case-insensitive search - Test SQL via CLI (
databricks experimental aitools tools query) or notebooks before deploying. If--warehouseis rejected on your CLI version, setDATABRICKS_WAREHOUSE_IDin the environment instead.
原文・著作権は Anthropic および各プラグイン作者に帰属します。日本語訳は Claude API による自動翻訳です。