Reference

Web Scraping Glossary

Plain-English definitions for the concepts that show up when you build scrapers in production — anti-bot, proxies, HTTP status codes, browser automation.

Copy the whole glossary as LLM-ready text →

Interactive references

HTTP Status Code Reference

Every code 100–599 plus Cloudflare 1xxx, with a concept map of how they relate and what each means for scrapers.

Anti-Bot Detector

Paste a URL to see which bot-protection vendor guards it, with a concept map linking vendors to detection techniques.

Concept map

How the glossary connects

Every term is linked to the concepts it depends on. Search, filter by category, and click a term to preview it.

245 terms · 0 connections

Hover · drag · click a term

Browse by category

Web Scraping APIs

76 terms

Core concepts behind modern web scraping APIs — what they do, how they handle hard sites, and where they fit in a data pipeline.

HTTP Errors

20 terms

Status codes scrapers hit constantly. Each entry explains what the code means, why it shows up in scraping, and how to recover from it.

Proxies

9 terms

Proxy types, rotation strategies, and the tradeoffs between residential, datacenter, and mobile IP pools.

Anti-Bot

63 terms

How modern bot-detection systems work — fingerprinting, behavioral signals, and the challenges that block automated traffic.

Crawling

10 terms

Discovering and fetching pages at scale — crawl scope, politeness, sitemaps, and how scrapers traverse links without getting blocked or wasting budget.

Reverse Engineering

14 terms

Reading code that was built to resist reading — how obfuscation works, why every layer is reversible, and the techniques used to recover the original logic.

Python Web Scraping

15 terms

Python is the most common language for web scraping. These guides cover the libraries, frameworks, and trade-offs you'll weigh when building scrapers in Python.

Web Technologies

7 terms

The web protocols and primitives every scraper developer should understand — HTTP, cookies, and REST APIs.

Web Automation

22 terms

Browser automation, headless browsers, and how the major anti-bot vendors detect and block scrapers.

Web Scraping by Language

9 terms

Language-by-language guides to web scraping — the right libraries, runnable code, and how to get past anti-bot blocking in Java, C#, Go, Ruby, PHP, R, Node.js and the command line.

All terms

What Is a 200 Status Code?

What Is Engine-Level (Blink) Browser Instrumentation?

What Is Kasada?

What Is zendriver?