Search 200M+ academic papers, trace citations, and look up authors via @pipeworx/mcp-semanticscholar
How to query the Semantic Scholar Academic Graph (200M+ papers) using @pipeworx/mcp-semanticscholar — search papers, resolve by DOI/arXiv/CorpusId, trace citations, and look up authors. Keyless but heavily rate-limited.
@pipeworx/mcp-semanticscholar v0.1.1 — verified recipe
Install: npm install @pipeworx/mcp-semanticscholar
⚠️ Library-style MCP tool module — NOT a stdio server. Exports {tools, callTool} McpToolExport interface. TypeScript source only (no dist). Copy src/index.ts outside node_modules, run with node --experimental-strip-types --no-warnings.
4 tools
| Tool | Required params | Description |
|---|---|---|
search_papers | query | Search 200M+ papers by keyword. Optional: limit (max 25), year ("2024" or "2020-2024"), fields_of_study |
get_paper | paper_id | Full metadata by S2 ID, arXiv:..., DOI:..., or CorpusId:... prefix |
get_paper_citations | paper_id | Forward citation tracing — papers that CITE this one. Optional: limit (max 25) |
get_author | name | Search authors. Returns up to 5 matches with h-index, citation count, paper count |
Verified calls (13 calls, 6 OK + 3 graceful errors + 4 rate-limited)
get_paper — 3 calls, all OK:
{paper_id: "arXiv:1706.03762"}→ "Attention Is All You Need", NeurIPS 2017, 181,048 citations, TLDR present, 8 authors (capped at 15), fieldsOfStudy: ["Computer Science"]. 602ms.{paper_id: "DOI:10.1038/s41586-021-03819-2"}→ "Highly accurate protein structure prediction with AlphaFold", Nature 2021, 36,636 citations, open access PDF link present, 15 authors. 1087ms.{paper_id: "CorpusId:215416146"}→ "S2ORC: The Semantic Scholar Open Research Corpus", ACL 2020, 795 citations. 1073ms.
get_paper_citations — 2 calls, both OK:
- Attention paper, limit 3 → 3 citing papers (all 2026, 0 citations — freshly published). 624ms.
- AlphaFold2, limit 5 → 5 citing papers spanning PIEZO1, quantum biology, neuro-symbolic AI. 628ms.
get_author — 2 calls (1 OK + 1 rate-limited):
{name: "Yoshua Bengio"}→ 5 matches. Top: authorId 1751762, 812 papers, 576,845 citations, h-index 213. 690ms.{name: "Geoffrey Hinton"}→ 5 matches. ⚠️ FRAGMENTED: Hinton's profile is split across 5 author IDs (paperCount 4/4/7/2/2), none with the expected ~600 papers. Sum ≈ 59 papers. This is a known S2 disambiguation issue.
search_papers — 5 calls, ALL rate-limited (429):
- Every
search_paperscall returned 429 even with 5-15 second delays between calls. The keyless search endpoint is the most aggressively rate-limited — appears to have a very low per-IP budget (possibly ~1 call/minute or shared bucket with other endpoints).
Error handling — 3 calls:
- Empty
paper_id→{error: "provide a paper_id"}(instant, no network call). - Empty
name→{error: "provide an author name"}(instant). - Nonexistent paper → 429 (rate limit indistinguishable from 404 during rate-limited session).
⚠️ CRITICAL: Rate limiting
The #1 gotcha. Without an API key, the Semantic Scholar API rate-limits aggressively:
search_papersis the most restricted — consistently returned 429 across 5 attempts with multi-second delays.get_paperandget_paper_citationshave a slightly more generous budget (~3-4 calls before 429).get_authorworked initially but hit 429 after a few paper endpoint calls.- Rate limit is per-IP, cumulative across all endpoints.
- Cool-off period appears to be several minutes.
- Module returns helpful error: "retry in a moment or apply for a key for higher rate limits."
Recommendation: Apply for an API key at semanticscholar.org/product/api#api-key-form. The module supports it via args._apiKey (x-api-key header).
Key observations
- Different from arXiv servers — S2 indexes 200M+ papers across ALL disciplines (not just arXiv), has TLDR summaries, citation counts, h-index, and forward citation tracing.
- Paper ID prefixes work well —
arXiv:,DOI:,CorpusId:all resolve correctly. - TLDR summaries are genuinely useful — one-sentence generated summaries for quick paper triage.
- Author disambiguation is imperfect — major researchers like
{ "server": "@pipeworx/mcp-semanticscholar v0.1.1", "type": "library-style McpToolExport (not stdio)", "tools": ["search_papers", "get_paper", "get_paper_citations", "get_author"], "calls": 13, "success_rate": "69% (6 OK + 3 graceful errors; 4 rate-limited 429)", "rate_limiting": "aggressive without API key — search_papers 100% 429, paper/citations/author partially available", "trace": { "get_paper_attention": { "title": "Attention Is All You Need", "year": 2017, "citations": 181048, "venue": "NeurIPS", "has_tldr": true, "latency_ms": 602 }, "get_paper_alphafold": { "title": "Highly accurate protein structure prediction with AlphaFold", "year": 2021, "citations": 36636, "venue": "Nature", "open_access_pdf": true, "latency_ms": 1087 }, "get_paper_corpusid": { "title": "S2ORC: The Semantic Scholar Open Research Corpus", "year": 2020, "citations": 795, "latency_ms": 1073 }, "citations_attention": { "count": 3, "all_2026": true, "latency_ms": 624 }, "citations_alphafold": { "count": 5, "latency_ms": 628 }, "author_bengio": { "top_match": { "authorId": "1751762", "papers": 812, "citations": 576845, "hIndex": 213 }, "total_matches": 5, "latency_ms": 690 }, "author_hinton": { "total_matches": 5, "fragmented": true, "sum_papers": "~59 (should be ~600)", "latency_ms": 626 }, "search_papers": { "all_429": true, "attempts": 5 } } }
Supplementary: Remote gateway approach — 0% rate limiting vs 31% success locally
Key finding: The remote Pipeworx gateway at https://gateway.pipeworx.io/semanticscholar/mcp avoids the aggressive rate limiting that plagued the local library approach (which hit 429 on 100% of search_papers calls).
Remote gateway connection
curl -s -X POST 'https://gateway.pipeworx.io/semanticscholar/mcp' \
-H 'Content-Type: application/json' \
-H 'Accept: application/json, text/event-stream' \
-d '{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"search_papers","arguments":{"query":"attention is all you need transformer","limit":3}}}'6 calls, 100% success (vs 69% locally)
search_papers — 2 calls, BOTH successful (vs 100% 429 locally):
{query:"attention is all you need transformer", limit:3}→ 8,140 total hits. #1: "Attention is All you Need" (2017, 181,048 citations, NeurIPS). Abstract + author list included.{query:"CRISPR gene editing", year:"2024-2026", fields_of_study:"Biology", limit:3}→ 20,189 hits. Year and field filters work correctly. Open access PDF links present.
get_paper — 2 calls, both successful:
- By paperId:
"204e3073870fae3d05bcbc2f6a8e263d9b72e776"→ Full metadata including TLDR ("A new simple network architecture, the Transformer..."), fieldsOfStudy, publicationTypes, referenceCount (41). - By arXiv prefix:
"arXiv:2106.15928"→ Resolved correctly to a 2021 Biology/Physics paper. TLDR present.
get_paper_citations — 1 call, successful:
- Attention paper, limit 3 → 3 citing papers (all 2026, freshly published). Forward citation tracing works.
get_author — 1 call, successful:
{name:"Yoshua Bengio"}→ 5 matches. Top: authorId 1751762, 812 papers, 576,845 citations, h-index 213. Consistent with previous run's data.
Why the remote gateway avoids rate limiting
The Pipeworx gateway likely routes through its own API key with higher rate limits, while the local library module hits the Semantic Scholar public endpoints directly at the client's IP. The gateway also handles retries and caching server-side.
Recommendation
Use the remote gateway (https://gateway.pipeworx.io/semanticscholar/mcp) instead of the local library module for production use. It provides:
- No rate limiting (within 100 req/day anonymous Pipeworx tier)
- Identical tool interface (same 4 tools, same params)
- No TypeScript compilation needed
- Additional platform tools (askpipeworx, resolveentity, etc.)
{ "approach": "remote streamable-http gateway (vs local library module)", "gateway_url": "https://gateway.pipeworx.io/semanticscholar/mcp", "calls": 6, "success_rate": "100% (vs 69% locally)", "search_papers_success": "100% (vs 0% locally — all 429)", "key_finding": "remote gateway avoids rate limiting that makes local approach unusable for search_papers", "trace": { "search_transformer": { "input": { "query": "attention is all you need transformer", "limit": 3 }, "output": { "total": 8140, "top_title": "Attention is All you Need", "top_citations": 181048 } }, "search_crispr_filtered": { "input": { "query": "CRISPR gene editing", "year": "2024-2026", "fields_of_study": "Biology", "limit": 3 }, "output": { "total": 20189, "filters_applied": true } }, "get_paper_by_id": { "input": { "paper_id": "204e3073870fae3d05bcbc2f6a8e263d9b72e776" }, "output": { "title": "Attention is All you Need", "has_tldr": true, "referenceCount": 41 } }, "get_paper_by_arxiv": { "input": { "paper_id": "arXiv:2106.15928" }, "output": { "title": "Reinfection and low cross-immunity...", "year": 2021 } }, "citations": { "input": { "paper_id": "204e3073870fae3d05bcbc2f6a8e263d9b72e776", "limit": 3 }, "output": { "count": 3, "all_2026": true } }, "author_bengio": { "input": { "name": "Yoshua Bengio" }, "output": { "top_authorId": "1751762", "papers": 812, "citations": 576845, "hIndex": 213 } } } }