Run analytical SQL on local CSV/Parquet/JSON files via mcp-server-duckdb (uvx)
How do you run analytical SQL queries directly on CSV, Parquet, or JSON files without setting up a database server? DuckDB's in-process engine does this via a single MCP tool — query — that accepts any valid DuckDB SQL including read_csv_auto(), read_parquet(), read_json_auto(), window functions, and DuckDB-specific extensions like QUALIFY.
Recipe: Analytical SQL on local CSV files via mcp-server-duckdb
Launch
uvx mcp-server-duckdb --db-path /tmp/my-analytics.dbTransport: stdio. Server: mcp-duckdb-server v1.27.2. One tool: query.
Tool inventory
| tool | params | description |
|---|---|---|
query | query: string (required) | Execute any DuckDB SQL — DDL, DML, analytics, file reads |
Key capability: Direct file queries
DuckDB can query CSV, Parquet, and JSON files directly without importing:
read_csv_auto('/path/to/file.csv')— auto-detect schemaread_parquet('/path/to/file.parquet')read_json_auto('/path/to/file.json')
Verified: GROUP BY aggregation on CSV
Query: Revenue by product with computed average unit price
{"jsonrpc":"2.0","id":3,"method":"tools/call","params":{"name":"query","arguments":{"query":"SELECT product, SUM(units) as total_units, SUM(revenue) as total_revenue, ROUND(AVG(revenue/units),2) as avg_unit_price FROM read_csv_auto('/tmp/pathfinder-sales.csv') GROUP BY product ORDER BY total_revenue DESC"}}}Response:
{"jsonrpc":"2.0","id":3,"result":{"content":[{"type":"text","text":"[('Widget A', 710, 14200.0, 20.0), ('Widget B', 400, 8000.0, 20.0), ('Widget C', 155, 7750.0, 50.0)]"}],"isError":false}}Verified: Window functions + QUALIFY
Query: Top revenue product per region using RANK() + DuckDB's QUALIFY clause
{"jsonrpc":"2.0","id":4,"method":"tools/call","params":{"name":"query","arguments":{"query":"SELECT region, product, revenue, RANK() OVER (PARTITION BY region ORDER BY revenue DESC) as rank_in_region FROM read_csv_auto('/tmp/pathfinder-sales.csv') QUALIFY rank_in_region = 1 ORDER BY revenue DESC"}}}Response:
{"jsonrpc":"2.0","id":4,"result":{"content":[{"type":"text","text":"[('South', 'Widget A', 6000.0, 1), ('East', 'Widget B', 3500.0, 1), ('North', 'Widget B', 2800.0, 1), ('West', 'Widget C', 2250.0, 1)]"}],"isError":false}}Notes
- Cold start: ~3s via uvx (Python install + server init)
- Query latency: sub-100ms for small files
- QUALIFY: DuckDB-specific SQL extension — filters on window functions without a subquery. Very useful for top-N-per-group queries.
- No schema import needed:
read_csv_autoinfers column types automatically - Output format: tuples as text, not structured JSON rows — parse client-side if needed
- --db-path: required flag, creates a DuckDB database file (can also be
:memory:for ephemeral)
When to use vs SQLite MCP
Use DuckDB when you need: direct file queries (CSV/Parquet/JSON without import), window functions, QUALIFY, or analytical workloads. Use SQLite when you need: persistent transactional storage, simpler queries, or smaller footprint.
{ "server": "mcp-duckdb-server", "version": "1.27.2", "launch": "uvx mcp-server-duckdb --db-path /tmp/pathfinder-duckdb-test.db", "transport": "stdio", "tools": ["query"], "traces": [ { "id": 3, "method": "tools/call", "tool": "query", "args": { "query": "SELECT product, SUM(units) as total_units, SUM(revenue) as total_revenue, ROUND(AVG(revenue/units),2) as avg_unit_price FROM read_csv_auto('/tmp/pathfinder-sales.csv') GROUP BY product ORDER BY total_revenue DESC" }, "result": "[('Widget A', 710, 14200.0, 20.0), ('Widget B', 400, 8000.0, 20.0), ('Widget C', 155, 7750.0, 50.0)]", "isError": false }, { "id": 4, "method": "tools/call", "tool": "query", "args": { "query": "SELECT region, product, revenue, RANK() OVER (PARTITION BY region ORDER BY revenue DESC) as rank_in_region FROM read_csv_auto('/tmp/pathfinder-sales.csv') QUALIFY rank_in_region = 1 ORDER BY revenue DESC" }, "result": "[('South', 'Widget A', 6000.0, 1), ('East', 'Widget B', 3500.0, 1), ('North', 'Widget B', 2800.0, 1), ('West', 'Widget C', 2250.0, 1)]", "isError": false } ] }