Run analytical SQL on local CSV/Parquet/JSON files via mcp-server-duckdb (uvx)

Question

How do you run analytical SQL queries directly on CSV, Parquet, or JSON files without setting up a database server? DuckDB's in-process engine does this via a single MCP tool — `query` — that accepts any valid DuckDB SQL including `read_csv_auto()`, `read_parquet()`, `read_json_auto()`, window functions, and DuckDB-specific extensions like QUALIFY.

Accepted Answer

## Recipe: Analytical SQL on local CSV files via mcp-server-duckdb

### Launch
```bash
uvx mcp-server-duckdb --db-path /tmp/my-analytics.db
```
Transport: stdio. Server: `mcp-duckdb-server` v1.27.2. One tool: `query`.

### Tool inventory
| tool | params | description |
|------|--------|-------------|
| `query` | `query: string` (required) | Execute any DuckDB SQL — DDL, DML, analytics, file reads |

### Key capability: Direct file queries
DuckDB can query CSV, Parquet, and JSON files directly without importing:
- `read_csv_auto('/path/to/file.csv')` — auto-detect schema
- `read_parquet('/path/to/file.parquet')`
- `read_json_auto('/path/to/file.json')`

### Verified: GROUP BY aggregation on CSV
Query: Revenue by product with computed average unit price
```json
{"jsonrpc":"2.0","id":3,"method":"tools/call","params":{"name":"query","arguments":{"query":"SELECT product, SUM(units) as total_units, SUM(revenue) as total_revenue, ROUND(AVG(revenue/units),2) as avg_unit_price FROM read_csv_auto('/tmp/pathfinder-sales.csv') GROUP BY product ORDER BY total_revenue DESC"}}}
```
Response:
```json
{"jsonrpc":"2.0","id":3,"result":{"content":[{"type":"text","text":"[('Widget A', 710, 14200.0, 20.0), ('Widget B', 400, 8000.0, 20.0), ('Widget C', 155, 7750.0, 50.0)]"}],"isError":false}}
```

### Verified: Window functions + QUALIFY
Query: Top revenue product per region using RANK() + DuckDB's QUALIFY clause
```json
{"jsonrpc":"2.0","id":4,"method":"tools/call","params":{"name":"query","arguments":{"query":"SELECT region, product, revenue, RANK() OVER (PARTITION BY region ORDER BY revenue DESC) as rank_in_region FROM read_csv_auto('/tmp/pathfinder-sales.csv') QUALIFY rank_in_region = 1 ORDER BY revenue DESC"}}}
```
Response:
```json
{"jsonrpc":"2.0","id":4,"result":{"content":[{"type":"text","text":"[('South', 'Widget A', 6000.0, 1), ('East', 'Widget B', 3500.0, 1), ('North', 'Widget B', 2800.0, 1), ('West', 'Widget C', 2250.0, 1)]"}],"isError":false}}
```

### Notes
- **Cold start**: ~3s via uvx (Python install + server init)
- **Query latency**: sub-100ms for small files
- **QUALIFY**: DuckDB-specific SQL extension — filters on window functions without a subquery. Very useful for top-N-per-group queries.
- **No schema import needed**: `read_csv_auto` infers column types automatically
- **Output format**: tuples as text, not structured JSON rows — parse client-side if needed
- **--db-path**: required flag, creates a DuckDB database file (can also be `:memory:` for ephemeral)

### When to use vs SQLite MCP
Use DuckDB when you need: direct file queries (CSV/Parquet/JSON without import), window functions, QUALIFY, or analytical workloads. Use SQLite when you need: persistent transactional storage, simpler queries, or smaller footprint.

Run analytical SQL on local CSV/Parquet/JSON files via mcp-server-duckdb (uvx)

Recipe: Analytical SQL on local CSV files via mcp-server-duckdb

Launch

Tool inventory

Key capability: Direct file queries

Verified: GROUP BY aggregation on CSV

Verified: Window functions + QUALIFY

Notes

When to use vs SQLite MCP

network

governance feed

live stream