Run analytical SQL on CSV/Parquet/JSON files via mcp-server-duckdb (uvx)
DuckDB can read CSV/Parquet/JSON files directly with read_csv_auto(), so you skip the entire ETL step — no CREATE TABLE, no COPY, no schema definition. Point SQL at your file and go. One tool (query), one argument (query), full DuckDB SQL including window functions, CTEs, and multi-file joins.
Recipe: Analytical SQL on local CSV files via mcp-server-duckdb
Server: mcp-server-duckdb v1.27.2 via uvx Transport: stdio (NDJSON) Launch: uvx mcp-server-duckdb --db-path /tmp/my.db Tools: 1 — query (param: query, type: string, required)
What it does
DuckDB reads CSV/Parquet/JSON files in-place with read_csv_auto('/path/to/file.csv') — no CREATE TABLE, no import step. Full SQL: GROUP BY, window functions, CTEs, JOINs across multiple files, HAVING, ORDER BY.
Gotcha
The tool parameter is query (not sql). Sending {"sql": "..."} returns Input validation error: 'query' is a required property.
Latency
- Query 1 (GROUP BY + SUM): 34ms
- Query 2 (GROUP BY two cols): 6ms
- Query 3 (SUM + window function): 9ms
Cold start (uvx + DuckDB init): ~3s.
Trace (3 queries on an 8-row sales CSV)
Query 1 — aggregate by product:
SELECT product, SUM(revenue) as total_revenue, SUM(units) as total_units
FROM read_csv_auto('/tmp/sales_data.csv')
GROUP BY product ORDER BY total_revenue DESC→ [('Widget B', 69800, 1163), ('Widget A', 47600, 952)]
Query 2 — cross-tab by region + quarter:
SELECT region, quarter, SUM(revenue) as revenue, SUM(units) as units
FROM read_csv_auto('/tmp/sales_data.csv')
GROUP BY region, quarter ORDER BY region, quarter→ [('North', 'Q1', 30800, 555), ('North', 'Q2', 33600, 607), ('South', 'Q1', 25000, 449), ('South', 'Q2', 28000, 504)]
Query 3 — percentage of total with window function:
SELECT product, region, SUM(revenue) as revenue,
ROUND(100.0 * SUM(revenue) / SUM(SUM(revenue)) OVER (), 1) as pct_of_total
FROM read_csv_auto('/tmp/sales_data.csv')
GROUP BY product, region ORDER BY revenue DESC→ [('Widget B', 'North', 37800, 32.2), ('Widget B', 'South', 32000, 27.3), ('Widget A', 'North', 26600, 22.7), ('Widget A', 'South', 21000, 17.9)]
{ "server": "mcp-server-duckdb", "version": "1.27.2", "launch": "uvx mcp-server-duckdb --db-path /tmp/my.db", "transport": "stdio", "framing": "ndjson", "tools": [ { "name": "query", "params": { "query": "string (required)" } } ], "trace": { "initialize": { "serverInfo": { "name": "mcp-duckdb-server", "version": "1.27.2" } }, "calls": [ { "tool": "query", "args": { "query": "SELECT product, SUM(revenue) as total_revenue, SUM(units) as total_units FROM read_csv_auto('/tmp/sales_data.csv') GROUP BY product ORDER BY total_revenue DESC" }, "result": "[('Widget B', 69800, 1163), ('Widget A', 47600, 952)]", "latency_ms": 34, "success": true }, { "tool": "query", "args": { "query": "SELECT region, quarter, SUM(revenue) as revenue, SUM(units) as units FROM read_csv_auto('/tmp/sales_data.csv') GROUP BY region, quarter ORDER BY region, quarter" }, "result": "[('North', 'Q1', 30800, 555), ('North', 'Q2', 33600, 607), ('South', 'Q1', 25000, 449), ('South', 'Q2', 28000, 504)]", "latency_ms": 6, "success": true }, { "tool": "query", "args": { "query": "SELECT product, region, SUM(revenue) as revenue, ROUND(100.0 * SUM(revenue) / SUM(SUM(revenue)) OVER (), 1) as pct_of_total FROM read_csv_auto('/tmp/sales_data.csv') GROUP BY product, region ORDER BY revenue DESC" }, "result": "[('Widget B', 'North', 37800, 32.2), ('Widget B', 'South', 32000, 27.3), ('Widget A', 'North', 26600, 22.7), ('Widget A', 'South', 21000, 17.9)]", "latency_ms": 9, "success": true } ] } }