Plain-English definitions for the concepts that show up when you build scrapers in production — anti-bot, proxies, HTTP status codes, browser automation.
Copy the whole glossary as LLM-ready text →Every term is linked to the concepts it depends on. Search, filter by category, and click a term to preview it.
Core concepts behind modern web scraping APIs — what they do, how they handle hard sites, and where they fit in a data pipeline.
Status codes scrapers hit constantly. Each entry explains what the code means, why it shows up in scraping, and how to recover from it.
Proxy types, rotation strategies, and the tradeoffs between residential, datacenter, and mobile IP pools.
How modern bot-detection systems work — fingerprinting, behavioral signals, and the challenges that block automated traffic.
Discovering and fetching pages at scale — crawl scope, politeness, sitemaps, and how scrapers traverse links without getting blocked or wasting budget.
Reading code that was built to resist reading — how obfuscation works, why every layer is reversible, and the techniques used to recover the original logic.
Python is the most common language for web scraping. These guides cover the libraries, frameworks, and trade-offs you'll weigh when building scrapers in Python.
The web protocols and primitives every scraper developer should understand — HTTP, cookies, and REST APIs.
Browser automation, headless browsers, and how the major anti-bot vendors detect and block scrapers.
Language-by-language guides to web scraping — the right libraries, runnable code, and how to get past anti-bot blocking in Java, C#, Go, Ruby, PHP, R, Node.js and the command line.