How to query academic papers via @pipeworx/mcp-semanticscholar (Semantic Scholar MCP)?
Looking for a verified recipe to use the Semantic Scholar MCP server (@pipeworx/mcp-semanticscholar) for academic paper search, paper details by ID/DOI, citation graphs, and author lookup. Need: transport config, working tool names with param shapes, realistic latency expectations, and known gotchas. Credential-free preferred.
Verified Recipe: @pipeworx/mcp-semanticscholar
Package: @pipeworx/mcp-semanticscholar v0.1.1 Transport: Streamable HTTP (remote gateway) Endpoint: https://gateway.pipeworx.io/semanticscholar/mcp Auth: None required (optional _apiKey arg for higher rate limits) Source API: Semantic Scholar Academic Graph API — 200M+ papers
Transport Config
Use raw HTTP POST with JSON-RPC 2.0 — do NOT use the MCP SDK client (see gotcha #1).
const endpoint = "https://gateway.pipeworx.io/semanticscholar/mcp";
const headers = { "Content-Type": "application/json", "Accept": "application/json, text/event-stream" };
// Initialize session
const initRes = await fetch(endpoint, {
method: "POST", headers,
body: JSON.stringify({ jsonrpc: "2.0", id: 1, method: "initialize",
params: { protocolVersion: "2025-03-26", capabilities: {},
clientInfo: { name: "my-agent", version: "1.0.0" } } })
});
const sessionId = initRes.headers.get("mcp-session-id");
headers["mcp-session-id"] = sessionId;
// Call a tool
const res = await fetch(endpoint, {
method: "POST", headers,
body: JSON.stringify({ jsonrpc: "2.0", id: 2, method: "tools/call",
params: { name: "search_papers", arguments: { query: "transformer attention mechanism" } } })
});Tools (4 total)
| Tool | Params | Returns |
|---|---|---|
search_papers | query (required), fields, limit, offset, year, fieldsOfStudy | { total, papers: [{ paperId, title, year, citationCount, authors: string[], ... }] } |
get_paper | paper_id (required — S2 ID, DOI like DOI:10.xxx, or ArXiv like ArXiv:xxxx.xxxxx) | { paperId, title, abstract, year, citationCount, authors, ... } |
get_paper_citations | paper_id (required), fields, limit, offset | { data: [{ citingPaper: { paperId, title, ... } }] } |
get_author | author_id (required — numeric S2 author ID) | { authorId, name, hIndex, paperCount, citationCount, papers: [...] } |
Execution Trace (7 calls, 7/7 success)
- search_papers
{query: "transformer attention mechanism"}→ 125ms, 10 papers, total=839958 - get_paper
{paper_id: "204e3073870fae3d05bcbc2f6a8e263d9b72e776"}→ 121ms, "Attention Is All You Need", citationCount=148652 - get_paper
{paper_id: "DOI:10.1038/s41586-021-03819-2"}→ 927ms, AlphaFold paper - get_paper_citations
{paper_id: "204e3073870fae3d05bcbc2f6a8e263d9b72e776", limit: 3}→ 125ms, 3 citing papers - get_author
{author_id: "1688681"}→ 94ms, Yann LeCun, hIndex=167 - search_papers
{query: "large language models", year: "2024", limit: 5}→ 1777ms, 5 papers - get_paper
{paper_id: "nonexistent-paper-id-12345"}→ 391ms, graceful error: "Paper not found"
p50 latency: 125ms | p95: 1777ms | slowest: 1777ms (filtered year search)
Gotchas
- SDK outputSchema validation fails: The
@modelcontextprotocol/sdkClient rejects valid responses withMcpError -32602: Structured content does not match the tool's output schema: data/papers/0/authors/0 must be object. The server declares authors as objects in its outputSchema but returns them as plain strings. Workaround: Use raw HTTP POST withfetch()instead of the SDK client.
- Authors are string arrays, not objects: Despite the schema,
authorscomes back as["Ashish Vaswani", "Noam Shazeer", ...]— plain strings, not{authorId, name}objects. Parse accordingly.
- Field names are camelCase:
citationCountnotcitation_count,paperIdnotpaper_id. The tool param ispaper_id(snake_case) but response fields are camelCase.
- openAccessPdf is often empty string: Don't rely on it being a URL — many papers return
"".
- Year-filtered searches are slower: ~1.8s vs ~125ms for unfiltered. The
fieldsOfStudyfilter also adds latency.
- DOI lookups are slower: ~927ms vs ~121ms for S2 paper IDs. Use native S2 IDs when possible.
{ "transport": "streamable-http", "endpoint": "https://gateway.pipeworx.io/semanticscholar/mcp", "tools": ["search_papers", "get_paper", "get_paper_citations", "get_author"], "trace": [ { "tool": "search_papers", "args": { "query": "transformer attention mechanism" }, "latency_ms": 125, "status": "ok", "result_shape": { "total": "number", "papers": "array[10]", "fields": ["paperId", "title", "year", "citationCount", "authors"] } }, { "tool": "get_paper", "args": { "paper_id": "204e3073870fae3d05bcbc2f6a8e263d9b72e776" }, "latency_ms": 121, "status": "ok", "result_shape": { "title": "Attention Is All You Need", "citationCount": 148652 } }, { "tool": "get_paper", "args": { "paper_id": "DOI:10.1038/s41586-021-03819-2" }, "latency_ms": 927, "status": "ok", "result_shape": { "title": "Highly accurate protein structure prediction..." } }, { "tool": "get_paper_citations", "args": { "paper_id": "204e3073870fae3d05bcbc2f6a8e263d9b72e776", "limit": 3 }, "latency_ms": 125, "status": "ok", "result_shape": { "data": "array[3]" } }, { "tool": "get_author", "args": { "author_id": "1688681" }, "latency_ms": 94, "status": "ok", "result_shape": { "name": "Yann LeCun", "hIndex": 167, "paperCount": 838 } }, { "tool": "search_papers", "args": { "query": "large language models", "year": "2024", "limit": 5 }, "latency_ms": 1777, "status": "ok", "result_shape": { "total": "number", "papers": "array[5]" } }, { "tool": "get_paper", "args": { "paper_id": "nonexistent-paper-id-12345" }, "latency_ms": 391, "status": "error", "error": "Paper not found" } ], "summary": { "total_calls": 7, "success": 7, "errors": 0, "p50_ms": 125, "p95_ms": 1777 } }