All features
Everything.
Nothing extra.
Scrapeman is built for scraping engineers. Six auth schemes, a load runner with watched headers, a WebSocket client, pre and post-request scripts, native Scrape.do mode, and a git-friendly .sman file format. Apache 2.0, no account, no cloud sync.
HTTP Engine
Built on undici
Scrapeman's HTTP core uses undici, Node's official HTTP client and the layer behind built-in fetch. Spec-compliant, fast, with proxy and HTTP/2 support out of the box.
GET, POST, PUT, PATCH, DELETE, HEAD, OPTIONS, plus custom verbs like PROPFIND and QUERY.
Per-request allowH2 toggle via ALPN negotiation.
Skip cert checks the way curl does with -k. Self-signed proxies, expired certs, mitmproxy debug. The URL bar shows a red Insecure badge while it is on.
Set any header, override auto-managed ones, or disable them per request. Bulk-edit toggle switches the table to a Key: Value textarea.
JSON, form-urlencoded, multipart, raw text, binary, plus a CodeMirror editor with {{var}} highlight, autocomplete, and lenient JSON beautify with auto-fix.
URL bar and Params table stay two-way synced. Disabled rows survive saves and reloads.
Each press fires another request without cancelling the in-flight one. Live HUD tracks every parallel send with status and elapsed duration.
HTTP / HTTPS proxies with basic auth via undici ProxyAgent.
Response Handling
Nothing hidden, nothing lost
Responses arrive fully decompressed and rendered with syntax highlighting. Large bodies stay smooth thanks to virtualization.
gzip, brotli, and deflate responses decode automatically.
text/event-stream responses get an Events mode with id, event, retry, and a JSON tree per data block. Export as JSON.
Large responses captured fully. Virtualized rendering means a 5 MB body still scrolls smoothly. Save the response to a file with one click.
CodeMirror with one-dark in dark mode for JSON, HTML, XML, JavaScript, and CSS. JSON also has a collapsible Tree view with JSONPath copy.
Sandboxed iframe renders external CSS, images, and fonts. A base href is injected so relative URLs resolve to the originating server.
150 ms debounce, line-indexed match table, Enter / Shift+Enter navigation. The query persists across sends and auto re-runs.
Timing waterfall (DNS / TCP / TLS / TTFB / Download), sent URL and headers, redirect chain, TLS cert info with days-remaining warning, remote IP and HTTP version.
DNS, connect, TLS, TTFB, and download segmented with millisecond labels.
Authentication
Every scheme. No friction.
Six auth types built in. Tokens cache, refresh before expiry, and dedup across concurrent sends.
Plus AWS Signature v4 via aws4. API Key supports header and query placement.
Token fetched automatically, cached until expiry, refreshed before it expires. Concurrent requests share one in-flight fetch.
Browser-based flow with a local loopback callback on an ephemeral port. State validated on return.
S256 code challenge, no client secret required. Refresh token flow with proactive refresh 30 seconds before expiry.
Point at a .well-known/openid-configuration URL and Scrapeman autofills Token URL, Auth URL, and supported scopes.
Authorization header (default), query param, or form body field.
Decodes header and payload of access_token and id_token with a live exp countdown. Display only, no signature check.
Scripts and load testing
Pre/post scripts and a real load runner
JavaScript before and after every request. Stress-test any endpoint without leaving the app.
Mutate URL, headers, and body via the req proxy. Read and write variables across folder, collection, environment, and global scopes via bru.
Inspect res.getStatus() and res.getBody() (auto-parsed JSON). Run assertions with test() / expect().toBe(). Failures render in the Scripts response tab.
Node vm context with a 5-second timeout. No require, process, or import. Scripts round-trip through .sman as YAML literal blocks.
Bounded concurrency, live RPS, p50/p95/p99 latency, status histogram, error kind breakdown. Per-tab isolation so a run survives tab switches.
Track up to 10 headers across iterations. Per-status value distribution, top-5 values, numeric stats (min/max/avg/p50/p95/p99) when 95% parse as numbers.
Ring buffer (1 to 1000) captures bodies of failed iterations. Export as JSON for offline triage.
Run any folder sequentially or in parallel. CSV-driven iterations, abort mid-run, export the report as JSON, CSV, or self-contained HTML.
Variables and collections
Scoped variables, git-friendly files
One .sman YAML file per request. Variables resolve through five scopes with a clear precedence.
Custom YAML with stable key order. Bodies above 4 KB auto-promote to a sidecar file. Legacy .req.yaml files read as-is and migrate on first save.
Variables live at every level via _folder.yaml, .scrapeman/collection.yaml, and .scrapeman/globals.yaml. Auth blocks at the folder or collection level inherit down the tree.
Folder chain > active environment > collection > global > built-in. Highest match wins.
{{random}}, {{uuid}}, {{timestamp}}, {{timestampSec}}, {{isoDate}}, {{randomInt}} re-resolve on every send.
Open several workspaces in one window and switch from the sidebar header. Per-workspace tabs, env, and view persist across restart.
Builder state (URL, headers, body, auth, settings, scripts) round-trips through localStorage. Transient runtime is stripped before saving.
Right-click a request to stop syncing it. Backed by .git/info/exclude so teammates never see it. Cmd+Shift+H toggles on the active tab.
Status bar branch, source-control panel, stage / commit / push / pull, diff viewer. Diverged-branch dialog offers Rebase or Merge commit.
Scraping-first features
Built for real-world targets
Scrape.do native mode, anti-bot detection, UA presets, rotating proxies, and rate limiting. The scraping use case is a one-toggle affair.
One toggle rewrites the URL to api.scrape.do and forwards residential rotation, JS rendering, geo targeting, and ban retry parameters.
Cloudflare, HTTP 429, CAPTCHA markers, and bot-block bodies surface as a dismissable banner above the response with a Retry-After countdown.
9 presets: Scrapeman default, Chrome 124 macOS / Windows, Firefox 125 macOS / Windows, Safari 17 macOS / iOS, Googlebot, curl. Custom UA in Headers always overrides.
List of proxy URLs with round-robin or random strategy. Collection Runner rotates per request, Load Runner rotates per concurrent slot.
Fixed delay plus optional jitter (min / max ms). Honoured by Collection Runner and Load Runner.
Domain filter, manual add and inline edit, httpOnly value masking with reveal, JSON and Netscape exports, paste-import accepting document.cookie or cookies.txt.
undici maxHeaderSize bumped to 256 KiB so Cloudflare-fronted responses with big rotating cookies and signed proof-of-work tokens stop bouncing.
WebSocket
Bidirectional WebSocket on every tab
A WebSocket pane lives next to the HTTP request builder. Connect, send, and trace messages without leaving the tab.
Each message has a direction arrow, timestamp, and payload. JSON payloads expand inline with the Tree viewer.
Keep-alive ping every 30 seconds with round-trip latency in the pong row.
Routes through standard HTTP proxies or Scrape.do WS proxy endpoints.
Switching tabs leaves the connection open. The timeline picks up where you left off when you come back.
Save the full message log as JSON for replay or audit.
Import and export
Bring your existing collections
Read from five external formats. Export history as HAR. More export targets on the roadmap.
Parse from URL or paste. Tags become folders, paths and methods become requests, security schemes become auth, server URLs go to {{base_url}}.
Folder hierarchy, auth (basic, bearer, apikey, oauth2, awsSigV4), variables, body modes round-trip.
Reads INI-like .bru files: methods, headers, auth (bearer / basic), bodies, query and path params.
Walks the resource list by type, rebuilds the folder tree from _id / parentId, maps all five auth types.
Built on @scrape-do/curl-parser. Handles ANSI-C $'...' quoting, multipart, urlencoded, proxy, referer, basic auth, cookies, custom UA.
Import Chrome DevTools HAR exports. Export history back to HAR. Round-trip tested.
Generate curl, JS fetch, Python requests, or Go net/http from the current request. Inline resolved values or keep {{var}} templates.
Privacy
Zero data ever leaves your machine
No analytics. No crash reporting. No cloud sync. No account. Your requests and responses stay on your disk.
Collections live as .sman files in a workspace folder you choose. History is per-workspace JSONL in app data.
{{token}} stays {{token}} on disk. Secrets are never baked in.
Zero metrics, analytics, or usage data collected or transmitted.
Download and run. No registration, no email, no OAuth.
Nothing is backed up to a third-party server. Your data is your responsibility.
Apache 2.0 with an explicit patent grant in §3. Read the source on GitHub.