Reference

Web Scraping Glossary

Plain-English definitions for the concepts that show up when you build scrapers in production — anti-bot, proxies, HTTP status codes, browser automation.

Copy the whole glossary as LLM-ready text →

Interactive references

Concept map

How the glossary connects

Every term is linked to the concepts it depends on. Search, filter by category, and click a term to preview it.

245 terms · 0 connections
Hover · drag · click a term
Building map…

Browse by category

Web Scraping APIs
76 terms

Core concepts behind modern web scraping APIs — what they do, how they handle hard sites, and where they fit in a data pipeline.

HTTP Errors
20 terms

Status codes scrapers hit constantly. Each entry explains what the code means, why it shows up in scraping, and how to recover from it.

Proxies
9 terms

Proxy types, rotation strategies, and the tradeoffs between residential, datacenter, and mobile IP pools.

Anti-Bot
63 terms

How modern bot-detection systems work — fingerprinting, behavioral signals, and the challenges that block automated traffic.

Crawling
10 terms

Discovering and fetching pages at scale — crawl scope, politeness, sitemaps, and how scrapers traverse links without getting blocked or wasting budget.

Reverse Engineering
14 terms

Reading code that was built to resist reading — how obfuscation works, why every layer is reversible, and the techniques used to recover the original logic.

Python Web Scraping
15 terms

Python is the most common language for web scraping. These guides cover the libraries, frameworks, and trade-offs you'll weigh when building scrapers in Python.

Web Technologies
7 terms

The web protocols and primitives every scraper developer should understand — HTTP, cookies, and REST APIs.

Web Automation
22 terms

Browser automation, headless browsers, and how the major anti-bot vendors detect and block scrapers.

Web Scraping by Language
9 terms

Language-by-language guides to web scraping — the right libraries, runnable code, and how to get past anti-bot blocking in Java, C#, Go, Ruby, PHP, R, Node.js and the command line.

All terms

H
W