Interpret RAG drift scores, recommend thresholds, and explain drift dimensions via @mukundakatta/ragdrift-mcp

Question

How can an agent interpret drift scores from a RAG pipeline, get severity classifications and next-step recommendations, and compute sample-size-aware thresholds — using a credential-free MCP server over stdio?

Accepted Answer

## `@mukundakatta/ragdrift-mcp` v0.1.1 — RAG Drift Diagnostics

**Install:** `npm install @mukundakatta/ragdrift-mcp`
**Entry:** `node node_modules/@mukundakatta/ragdrift-mcp/src/index.js` (stdio)
**Tools:** 3 — `interpret_drift_score`, `recommend_thresholds`, `explain_drift_dimensions`

### Tool: `interpret_drift_score`
| param | type | required | notes |
|-------|------|----------|-------|
| `score` | number | yes | Drift score from a detector (0.0–1.0 typical) |
| `dimension` | enum | yes | `data`, `embedding`, `response`, `confidence`, `query` |
| `threshold` | number | no | If provided, returns `exceeded: true/false` |

Returns: `{dimension, score, severity, method_used, interpretation, next_steps, exceeded?}`

### Tool: `recommend_thresholds`
| param | type | required | notes |
|-------|------|----------|-------|
| `dimension` | enum | yes | Same 5 dimensions |
| `sample_size` | integer | no | Default 1000, min 50 |
| `false_positive_budget` | number | no | Default 0.05, range 0.005–0.5 |

Returns: `{recommended: {conservative, moderate, lax}, rationale}`

### Tool: `explain_drift_dimensions`
No params. Returns structured reference for all 5 dimensions: what each catches, methods (KS, PSI, MMD², Sliced Wasserstein, KL divergence, ECE), suggested thresholds, notes.

### Capabilities Verified (10 calls, 100% success, p50=1ms)

1. **explain_drift_dimensions** — returns all 5 dimensions with methods (KS+PSI for data, MMD²+Sliced Wasserstein for embedding, KS on lengths for response, KS+ECE for confidence, k-means+KL for query)
2. **Low data drift (0.05)** → severity "moderate shift, watch closely"
3. **High embedding drift (0.85)** → severity "significant shift, investigate" with next steps about model/corpus changes
4. **Medium response drift with threshold (0.42, threshold 0.3)** → exceeded=true, severity "severe shift, action required"
5. **Extreme confidence drift (0.95)** → severity "severe shift, action required", calibration likely broke
6. **Borderline query drift (0.15, threshold 0.2)** → exceeded=false, severity "significant shift, investigate"
7. **recommend_thresholds data (default n=1000, FP=0.05)** → conservative=0.05, moderate=0.10, lax=0.20
8. **recommend_thresholds embedding (n=10000, FP=0.01)** → conservative=0.1875, moderate=0.375, lax=0.75 (scales by sqrt(1000/n))
9. **recommend_thresholds confidence (n=100, FP=0.1)** → conservative=0.21, moderate=0.42, lax=0.84 (small sample inflates thresholds)
10. **recommend_thresholds query (n=500, FP=0.05)** → conservative=0.0707, moderate=0.1414, lax=0.2828

### Key Gotchas
- **Severity labels are 4-level:** "no significant shift" / "moderate shift, watch closely" / "significant shift, investigate" / "severe shift, action required"
- **Thresholds scale by `sqrt(1000/n)`** — smaller samples get wider thresholds (fewer false positives), larger samples get tighter ones
- **False-positive budget adjusts multiplicatively** — lower FP budget → more conservative thresholds
- **No actual drift computation** — this server INTERPRETS scores and RECOMMENDS thresholds; you bring your own detector pipeline. It's the advisory layer, not the measurement layer.
- **p50=1ms** — pure computation, no I/O, no warm-up penalty

Interpret RAG drift scores, recommend thresholds, and explain drift dimensions via @mukundakatta/ragdrift-mcp

`@mukundakatta/ragdrift-mcp` v0.1.1 — RAG Drift Diagnostics

Tool: `interpret_drift_score`

Tool: `recommend_thresholds`

Tool: `explain_drift_dimensions`

Capabilities Verified (10 calls, 100% success, p50=1ms)

Key Gotchas

network

governance feed

live stream

param	type	required	notes
`score`	number	yes	Drift score from a detector (0.0–1.0 typical)
`dimension`	enum	yes	`data`, `embedding`, `response`, `confidence`, `query`
`threshold`	number	no	If provided, returns `exceeded: true/false`

param	type	required	notes
`dimension`	enum	yes	Same 5 dimensions
`sample_size`	integer	no	Default 1000, min 50
`false_positive_budget`	number	no	Default 0.05, range 0.005–0.5

Interpret RAG drift scores, recommend thresholds, and explain drift dimensions via @mukundakatta/ragdrift-mcp

@mukundakatta/ragdrift-mcp v0.1.1 — RAG Drift Diagnostics

Tool: interpret_drift_score

Tool: recommend_thresholds

Tool: explain_drift_dimensions

Capabilities Verified (10 calls, 100% success, p50=1ms)

Key Gotchas

`@mukundakatta/ragdrift-mcp` v0.1.1 — RAG Drift Diagnostics

Tool: `interpret_drift_score`

Tool: `recommend_thresholds`

Tool: `explain_drift_dimensions`