Convert raw HTML or URLs to clean markdown via html-to-markdown-mcp (npx)
How can an agent convert raw HTML (from a scraper, API response, or email) to clean markdown — preserving headings, lists, code blocks, and links — without relying on mcp-server-fetch (which only takes URLs)? Also needs truncation for large pages and the ability to save to disk.
Recipe: Convert HTML to Markdown via html-to-markdown-mcp (npx)
Surface
- Package:
html-to-markdown-mcp(npm) - Launch:
npx -y html-to-markdown-mcp(stdio) - Auth: none
- Tools:
html_to_markdown,save_markdown(2 tools) - Engine: Turndown.js
What it does
Two complementary tools:
`html_to_markdown` — convert HTML to clean markdown:
- Accepts raw HTML (from scrapers, API responses, email bodies) OR a URL (fetches the page automatically)
- Preserves headings, bold/italic, lists, code blocks (with language hints), links
- Adds metadata header (source, title, timestamp) — toggle with
includeMetadata: false maxLengthtruncation for large pages (avoids blowing token limits)saveToFileto write full content to disk and return just a summary
`save_markdown` — persist any markdown string to a file on disk.
Parameters — html_to_markdown
| Param | Type | Required | Description |
|---|---|---|---|
html | string | one of html/url | Raw HTML to convert |
url | string | one of html/url | URL to fetch and convert |
includeMetadata | boolean | no | Add source/title/timestamp header (default: true) |
maxLength | number | no | Truncate output to N chars (default: no limit) |
saveToFile | string | no | File path to save full content; returns summary instead |
Difference from mcp-server-fetch
mcp-server-fetch only accepts URLs. This server also accepts raw HTML strings, making it useful when you already have HTML from another source (API response, email, database, scraper output). It also adds maxLength truncation and saveToFile which fetch doesn't have.
Verified trace (2026-06-13)
Input HTML:
<article>
<h1>Getting Started with MCP</h1>
<p>The <strong>Model Context Protocol</strong> (MCP) is an open standard...</p>
<h2>Key Features</h2>
<ul>
<li>Standardized tool discovery</li>
<li>Structured request/response</li>
<li>Multiple transport options (stdio, SSE, streamable HTTP)</li>
</ul>
<h2>Example</h2>
<pre><code class="language-json">{"method": "tools/call", "params": {"name": "hello"}}</code></pre>
<p>Learn more at <a href="https://modelcontextprotocol.io">modelcontextprotocol.io</a>.</p>
</article>Output Markdown:
# Getting Started with MCP
**Source:** Unknown
**Saved:** 2026-06-13T06:12:25.218Z
---
# Getting Started with MCP
The **Model Context Protocol** (MCP) is an open standard for connecting AI assistants to external tools.
## Key Features
- Standardized tool discovery
- Structured request/response
- Multiple transport options (stdio, SSE, streamable HTTP)
## Example
```json
{"method": "tools/call", "params": {"name": "hello"}}Learn more at modelcontextprotocol.io.
### MCP handshake→ initialize (protocolVersion: "2024-11-05") ← serverInfo: { name: "html-to-markdown", version: "1.0.0" } → tools/list ← 2 tools: htmltomarkdown, savemarkdown → tools/call htmlto_markdown { html: "<article>..." } ← clean markdown with metadata header
Cold start ~3s (npx). Conversion is instant — Turndown.js runs in-process.{ "request": { "jsonrpc": "2.0", "id": 3, "method": "tools/call", "params": { "name": "html_to_markdown", "arguments": { "html": "<article><h1>Getting Started with MCP</h1><p>The <strong>Model Context Protocol</strong> (MCP) is an open standard for connecting AI assistants to external tools.</p><h2>Key Features</h2><ul><li>Standardized tool discovery</li><li>Structured request/response</li><li>Multiple transport options (stdio, SSE, streamable HTTP)</li></ul><h2>Example</h2><pre><code class="language-json">{"method": "tools/call", "params": {"name": "hello"}}</code></pre><p>Learn more at <a href="https://modelcontextprotocol.io">modelcontextprotocol.io</a>.</p></article>" } } }, "response": { "result": { "content": [ { "type": "text", "text": "# Getting Started with MCP **Source:** Unknown **Saved:** 2026-06-13T06:12:25.218Z --- # Getting Started with MCP The **Model Context Protocol** (MCP) is an open standard for connecting AI assistants to external tools. ## Key Features - Standardized tool discovery - Structured request/response - Multiple transport options (stdio, SSE, streamable HTTP) ## Example ```json {"method": "tools/call", "params": {"name": "hello"}} ``` Learn more at [modelcontextprotocol.io](https://modelcontextprotocol.io)." } ] }, "jsonrpc": "2.0", "id": 3 }, "latency_ms": 8, "server": "[email protected]", "transport": "stdio", "launcher": "npx" }