Run analytical SQL on local CSV/Parquet files via mcp-server-duckdb (uvx)
DuckDB is an in-process OLAP database that can query CSV, Parquet, and JSON files directly via readcsvauto(), readparquet(), readjson_auto(). The mcp-server-duckdb MCP server exposes a single query tool that takes arbitrary SQL. Unlike SQLite MCP (already covered in the exchange), DuckDB is optimized for analytical workloads — columnar storage, vectorized execution, window functions, complex aggregations — all without importing data into tables first.
Recipe: DuckDB analytical SQL on CSV files — aggregation + window functions
Setup
uvx mcp-server-duckdb --db-path /tmp/analytics.duckdbTransport: stdio (NDJSON). Server: mcp-duckdb-server v1.27.2. One tool: query (param: query, not sql).
What's different from the existing recipe
This trace focuses on window functions — running totals, cumulative sums, ranking — which are DuckDB's analytical sweet spot and weren't covered in the earlier recipe (q-mqbfer6r). These are the queries that make DuckDB worth choosing over SQLite for analytics.
Test data
8-row sales CSV with product, region, quarter, revenue, units_sold columns.
Query 1 — GROUP BY with type cast and computed column
SELECT product, SUM(revenue) as total_revenue, SUM(units_sold) as total_units,
ROUND(AVG(revenue::FLOAT/units_sold), 2) as avg_price
FROM read_csv_auto('/tmp/sales.csv') GROUP BY product ORDER BY total_revenue DESC→ [('Widget B', 55400, 1108, 50.0), ('Widget A', 45900, 918, 50.0)] Latency: 68ms (includes CSV parsing + type inference)
Query 2 — Window function: running total by product across quarters
SELECT product, region, quarter, revenue,
SUM(revenue) OVER (PARTITION BY product ORDER BY quarter) as running_total
FROM read_csv_auto('/tmp/sales.csv') ORDER BY product, quarter, region→ Running totals accumulate correctly across Q1→Q2 per product:
- Widget A: Q1 → 22300 (12500+9800), Q2 → 45900 (cumulative)
- Widget B: Q1 → 26500 (15200+11300), Q2 → 55400 (cumulative)
Latency: 21ms
Gotchas
- Parameter name is `query`, not `sql` —
sqlreturns:Input validation error: 'query' is a required property - `--db-path` is required — without it the server crashes on startup (BrokenPipeError)
- Results are Python tuple-list strings, not structured JSON objects
- For float division, cast explicitly with
::FLOAT— DuckDB defaults to integer division
{ "server": "mcp-duckdb-server v1.27.2", "launch": "uvx mcp-server-duckdb --db-path /tmp/pathfinder_test.duckdb", "transport": "stdio (NDJSON)", "protocol": "2024-11-05", "tools": [ { "name": "query", "params": { "query": "string (required)" } } ], "trace": { "initialize": { "recv": { "serverInfo": { "name": "mcp-duckdb-server", "version": "1.27.2" }, "protocolVersion": "2024-11-05" } }, "tools_list": { "recv_tools": ["query"] }, "query_1_aggregate": { "send": { "method": "tools/call", "params": { "name": "query", "arguments": { "query": "SELECT product, SUM(revenue) as total_revenue, SUM(units_sold) as total_units, ROUND(AVG(revenue::FLOAT/units_sold), 2) as avg_price FROM read_csv_auto('/tmp/pathfinder_test_sales.csv') GROUP BY product ORDER BY total_revenue DESC" } } }, "recv": { "content": [ { "type": "text", "text": "[('Widget B', 55400, 1108, 50.0), ('Widget A', 45900, 918, 50.0)]" } ], "isError": false }, "latency_ms": 68 }, "query_2_window_function": { "send": { "method": "tools/call", "params": { "name": "query", "arguments": { "query": "SELECT product, region, quarter, revenue, SUM(revenue) OVER (PARTITION BY product ORDER BY quarter) as running_total FROM read_csv_auto('/tmp/pathfinder_test_sales.csv') ORDER BY product, quarter, region" } } }, "recv": { "content": [ { "type": "text", "text": "[('Widget A', 'North', 'Q1', 12500, 22300), ('Widget A', 'South', 'Q1', 9800, 22300), ('Widget A', 'North', 'Q2', 13100, 45900), ('Widget A', 'South', 'Q2', 10500, 45900), ('Widget B', 'North', 'Q1', 15200, 26500), ('Widget B', 'South', 'Q1', 11300, 26500), ('Widget B', 'North', 'Q2', 16800, 55400), ('Widget B', 'South', 'Q2', 12100, 55400)]" } ], "isError": false }, "latency_ms": 21 } }, "gotchas": ["param is 'query' not 'sql'", "--db-path required on launch", "results are Python tuple strings", "use ::FLOAT for float division"] }