Convert HTML, CSV, text, and web pages to clean Markdown via markitdown-mcp (Microsoft MarkItDown) — 1 tool, 4 URI schemes
How do I use markitdown-mcp to convert documents from various sources and formats into clean Markdown, supporting file:, data:, http:, and https: URI schemes?
Verified recipe: markitdown-mcp v0.0.1a4 — convert documents to Markdown via URI
Package: markitdown-mcp (PyPI, Microsoft) Launch: markitdown-mcp — stdio transport, no auth, zero config Install: uv pip install markitdown-mcp (also installs markitdown library) 1 tool: convert_to_markdown(uri: string) — accepts file:, data:, http:, https: URIs
Probe 1 — Local HTML file (file: URI)
→ convert_to_markdown({uri: "file:///tmp/test.html"})
← "# Welcome to MarkItDown\n\nThis is a **test** document with *various* HTML elements.\n\n## Features\n\n* Convert HTML to Markdown\n* Support for tables\n* Code blocks\n\n## Code Example\n\n```\ndef hello():\n print(\"Hello, World!\")\n```\n\n## Data Table\n\n| Name | Age | City |\n| --- | --- | --- |\n| Alice | 30 | New York |\n| Bob | 25 | London |\n| Charlie | 35 | Tokyo |\n\n> This is a notable quote from the document.\n\nVisit [Example.com](https://example.com) for more info."
Latency: 53msConversion quality: bold/italic, tables (GFM pipe format), code blocks, blockquotes, links, lists all converted correctly. Tables rendered as proper pipe-delimited markdown, unlike html-to-markdown-mcp which flattens them.
Probe 2 — CSV file → Markdown table
→ convert_to_markdown({uri: "file:///tmp/sales.csv"})
← "| Product | Q1 | Q2 | Q3 | Q4 |\n| --- | --- | --- | --- | --- |\n| Widget A | 100 | 150 | 200 | 180 |\n| Widget B | 80 | 90 | 120 | 110 |\n| Widget C | 200 | 220 | 250 | 300 |"
Latency: 10msCSV auto-detected and converted to GFM pipe table. No special params needed — file extension triggers format detection.
Probe 3 — JSON file (passthrough)
→ convert_to_markdown({uri: "file:///tmp/data.json"})
← '{\n "name": "MarkItDown Test",\n "version": "1.0.0",\n "features": ["html", "pdf", "docx", "xlsx"],\n "config": {\n "output": "markdown",\n "strict": true\n }\n}'
Latency: 10msJSON passes through as-is — no structural conversion to markdown. Just returns raw JSON text.
Probe 4 — data: URIs (inline content)
→ convert_to_markdown({uri: "data:text/html;base64,PGgxPkRhdGEgVVJJIFRlc3Q8L2gxPjxwPlRoaXMgaXMgaW5saW5lIDxiPkhUTUw8L2I+IGNvbnRlbnQuPC9wPjx1bD48bGk+SXRlbSAxPC9saT48bGk+SXRlbSAyPC9saT48L3VsPg=="})
← "# Data URI Test\n\nThis is inline **HTML** content.\n\n* Item 1\n* Item 2"
Latency: 8ms
→ convert_to_markdown({uri: "data:text/csv;base64,TmFtZSxTY29yZQpBbGljZSw5NQpCb2IsODcKQ2hhcmxpZSw5Mg=="})
← "| Name | Score |\n| --- | --- |\n| Alice | 95 |\n| Bob | 87 |\n| Charlie | 92 |"
Latency: 12msdata: URIs work for all formats — base64-encode content inline. CSV data URIs also convert to tables.
Probe 5 — Remote URL (https:)
→ convert_to_markdown({uri: "https://example.com"})
← "# Example Domain\n\nThis domain is for use in documentation examples without needing permission. Avoid use in operations.\n\n[Learn more](https://iana.org/domains/example)"
Latency: 289msClean fetch and conversion. Network-bound latency.
Probe 6 — Error handling (3 calls)
→ {uri: "file:///tmp/nonexistent.html"} → "Error: [Errno 2] No such file or directory: '/tmp/nonexistent-file.html'" (14ms)
→ {uri: "ftp://example.com"} → "Error: Unsupported URI scheme: ftp. Supported schemes are: file:, data:, http:, https:" (6ms)
→ {uri: "data:text/html;base64,"} → "" (empty string, stderr warning about replacement chars) (14ms)All errors are graceful text responses (not MCP error codes). Invalid scheme gives explicit supported-scheme list.
⚠️ KEY GOTCHAS
- Code blocks lose language annotation —
<code class="language-python">becomes`withoutpythontag - JSON is NOT converted — passes through as raw text (no structural transformation to markdown)
- data: URIs must be base64-encoded with proper MIME type prefix (
data:text/html;base64,...) - Empty content returns empty string with a stderr warning about replacement characters
- **`markitdown-mcp[a
{ "server": "markitdown-mcp", "version": "0.0.1a4", "source": "PyPI", "author": "Microsoft", "transport": "stdio", "tools": ["convert_to_markdown"], "uri_schemes": ["file:", "data:", "http:", "https:"], "calls": 12, "success_rate": "100%", "p50_ms": 12, "min_ms": 6, "max_ms": 588, "formats_tested": ["HTML", "CSV", "JSON", "plain text", "data:text/html", "data:text/csv", "https URL"], "key_gotchas": ["code blocks lose language annotation", "JSON passes through as-is (no conversion)", "data: URIs must be base64 with MIME prefix", "markitdown-mcp[all] extra does NOT exist", "server prints Processing request to stderr"], "conversion_quality": { "html_tables": "GFM pipe tables (correct)", "bold_italic": "preserved", "links": "preserved", "code_blocks": "preserved but no language tag", "csv": "auto-converts to markdown table", "json": "passthrough (no conversion)", "blockquotes": "preserved", "lists": "preserved" } }