Skip to content

pim97/scrappey-cli

Repository files navigation

scrappey-cli

The official command-line client for the Scrappey scraping API. Scrape any URL, bypass common antibot systems, and pipe markdown into jq/llm/cron — all from your shell.

npm license: MIT node

npm install -g @scrappey/scrappey-cli

scrappey-cli auth --api-key YOUR_KEY
scrappey-cli scrape https://example.com -m -o example.md

Features

  • One binary, zero runtime dependencies — pure Node 18+, nothing but package.json.
  • Full API surface — GET/POST/PUT/PATCH/DELETE, sessions, proxy selection, custom headers, cookies, screenshots.
  • LLM-ready output--markdown returns markdown for RAG pipelines.
  • Shell-native — HTML to stdout by default; pipe straight into jq, grep, or an LLM.
  • Safe key storage~/.config/scrappey-cli/config.json at 0600, overridable via SCRAPPEY_API_KEY or .env.

Install

# Global install
npm install -g @scrappey/scrappey-cli

# Or run without installing
npx @scrappey/scrappey-cli scrape https://example.com

Requires Node.js 18 or newer.

You'll need an API key from scrappey.com.

Authentication

Resolution order (first match wins):

  1. --api-key <key> flag on any command
  2. SCRAPPEY_API_KEY environment variable
  3. SCRAPPEY_API_KEY=… in a .env file in the current directory
  4. Saved config at ~/.config/scrappey-cli/config.json (file mode 0600)
# Save key to config (one-time)
scrappey-cli auth --api-key YOUR_KEY

# Inspect what's currently resolved (key is masked)
scrappey-cli auth --show
# → key=abcd…wxyz source=config

# Remove saved key
scrappey-cli auth --logout

Commands

scrape <url>

Scrape a URL and print the body to stdout.

# Plain HTML to stdout
scrappey-cli scrape https://example.com

# Markdown for LLM pipelines
scrappey-cli scrape https://example.com -m

# Write HTML to file
scrappey-cli scrape https://example.com -o page.html

# Full JSON response (solution, cookies, status, timing)
scrappey-cli scrape https://example.com --json | jq '.solution.statusCode'

# Anti-bot + geo proxy
scrappey-cli scrape https://protected.example \
  --cloudflare --country UnitedStates --premium

# POST with JSON body and headers
scrappey-cli scrape https://httpbin.org/post \
  -X POST \
  -H 'content-type: application/json' \
  -d '{"name":"demo","count":42}' --json

# Cheap request mode (no JS render)
scrappey-cli scrape https://api.example.com/data --request-type request --json

# Page screenshot
scrappey-cli scrape https://example.com --screenshot shot.png

Options:

Flag Description
-o, --output <file> Write body to file (default: stdout)
-m, --markdown Return markdown instead of HTML
--json Print full JSON response
-X, --method <METHOD> HTTP method (GET, POST, PUT, DELETE, PATCH)
-d, --data <json|string> Request body for POST/PUT/PATCH
-H, --header <K:V> Custom header (repeatable)
--cookies <string> Cookie string to set
--request-type <t> browser (default) or request
--country <code> Proxy country (e.g. UnitedStates, Germany)
--premium / --mobile Premium / mobile proxy pool
--cloudflare / --datadome / --kasada Enable antibot bypass
--solve-captchas Auto-solve detected captchas
--session <id> Reuse an existing session
--screenshot <file> Save page screenshot
--timeout <ms> Per-request timeout

auth

scrappey-cli auth --api-key KEY     # save key
scrappey-cli auth --show            # show source (masked)
scrappey-cli auth --logout          # remove saved key
scrappey-cli auth                   # prompt for key on stdin

balance

scrappey-cli balance
# → { "balance": 12345, ... }

session create | destroy

ID=$(scrappey-cli session create --country UnitedStates)
scrappey-cli scrape https://example.com --session "$ID"
scrappey-cli scrape https://example.com/next --session "$ID"
scrappey-cli session destroy "$ID"

Programmatic use

The CLI is also a library — useful if you want the client in a script without adding a full SDK:

import { ScrappeyClient } from '@scrappey/scrappey-cli';

const client = new ScrappeyClient({ apiKey: process.env.SCRAPPEY_API_KEY });
const res = await client.get({ url: 'https://example.com', markdown: true });
console.log(res.solution.markdown);

Pipelines

Classic Unix composition:

# Scrape + filter JSON
scrappey-cli scrape https://httpbin.org/json --json | jq '.solution.response | fromjson'

# Scrape + LLM summary
scrappey-cli scrape https://blog.example.com/post -m | llm 'summarize in 3 bullets'

# Batch from a file
while read url; do
  scrappey-cli scrape "$url" -m -o "out/$(echo "$url" | md5sum | cut -d' ' -f1).md"
done < urls.txt

Environment variables

Variable Purpose
SCRAPPEY_API_KEY Key, takes precedence over saved config
SCRAPPEY_CONFIG_DIR Override config directory (default ~/.config/scrappey-cli)
SCRAPPEY_LIVE=1 Enable live integration tests

Development

git clone https://github.com/YOUR_USER/scrappey-cli.git
cd scrappey-cli
npm test                                            # unit + CLI tests (no network)
SCRAPPEY_LIVE=1 SCRAPPEY_API_KEY=... npm run test:live   # hits real API (~3 credits)
node bin/scrappey-cli.js scrape https://example.com # run locally

No runtime dependencies; test runner is Node's built-in node --test.

Exit codes

Code Meaning
0 Success
1 API / network error
2 Bad usage (missing key, unknown command, bad flag)

License

MIT — see LICENSE. Contributions welcome via pull request.

Users are responsible for complying with the Scrappey terms of service and with the terms of any website they scrape.

About

Zero-dependency Node.js CLI for the Scrappey API. Scrape, session, proxy, and markdown-extract any URL with one command.

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors