Tools

Web fetch

The web_fetch tool does a plain HTTP GET and extracts readable content (HTML to markdown or text). It does not execute JavaScript.

For JS-heavy sites or login-protected pages, use the Web Browser instead.

Quick start

web_fetch is enabled by default -- no configuration needed. The agent can call it immediately:

javascript

await web_fetch({ url: "https://example.com/article" });

Tool parameters

urlstringrequired

URL to fetch. http(s) only.

extractMode'markdown' | 'text'default: markdown

Output format after main-content extraction.

maxCharsnumber

Truncate output to this many characters.

How it works

Fetch

Sends an HTTP GET with a Chrome-like User-Agent and Accept-Language header. Blocks private/internal hostnames and re-checks redirects.

Extract

Runs Readability (main-content extraction) on the HTML response.

Fallback (optional)

If Readability fails and Firecrawl is configured, retries through the Firecrawl API with bot-circumvention mode.

Cache

Results are cached for 15 minutes (configurable) to reduce repeated fetches of the same URL.

Config

json5

{  tools: {    web: {      fetch: {        enabled: true, // default: true        provider: "firecrawl", // optional; omit for auto-detect        maxChars: 50000, // max output chars        maxCharsCap: 50000, // hard cap for maxChars param        maxResponseBytes: 2000000, // max download size before truncation        timeoutSeconds: 30,        cacheTtlMinutes: 15,        maxRedirects: 3,        useTrustedEnvProxy: false, // let a trusted HTTP(S) env proxy resolve DNS        readability: true, // use Readability extraction        userAgent: "Mozilla/5.0 ...", // override User-Agent        ssrfPolicy: {          allowRfc2544BenchmarkRange: true, // opt-in for trusted fake-IP proxies using 198.18.0.0/15          allowIpv6UniqueLocalRange: true, // opt-in for trusted fake-IP proxies using fc00::/7        },      },    },  },}

Firecrawl fallback

If Readability extraction fails, web_fetch can fall back to Firecrawl for bot-circumvention and better extraction:

json5

{  tools: {    web: {      fetch: {        provider: "firecrawl", // optional; omit for auto-detect from available credentials      },    },  },  plugins: {    entries: {      firecrawl: {        enabled: true,        config: {          webFetch: {            apiKey: "fc-...", // optional if FIRECRAWL_API_KEY is set            baseUrl: "https://api.firecrawl.dev",            onlyMainContent: true,            maxAgeMs: 86400000, // cache duration (1 day)            timeoutSeconds: 60,          },        },      },    },  },}

plugins.entries.firecrawl.config.webFetch.apiKey supports SecretRef objects. Legacy tools.web.fetch.firecrawl.* config is auto-migrated by openclaw doctor --fix.

Current runtime behavior:

tools.web.fetch.provider selects the fetch fallback provider explicitly.
If provider is omitted, OpenClaw auto-detects the first ready web-fetch provider from available credentials. Non-sandboxed web_fetch can use installed plugins that declare contracts.webFetchProviders and register a matching provider at runtime. Today the bundled provider is Firecrawl.
Sandboxed web_fetch calls stay limited to bundled providers.
If Readability is disabled, web_fetch skips straight to the selected provider fallback. If no provider is available, it fails closed.

Trusted env proxy

If your deployment requires web_fetch to go through a trusted outbound HTTP(S) proxy, set tools.web.fetch.useTrustedEnvProxy: true.

In this mode, OpenClaw still applies hostname-based SSRF checks before sending the request, but it lets the proxy resolve DNS instead of doing local DNS pinning. Enable this only when the proxy is operator-controlled and enforces outbound policy after DNS resolution.

Limits and safety

maxChars is clamped to tools.web.fetch.maxCharsCap
Response body is capped at maxResponseBytes before parsing; oversized responses are truncated with a warning
Private/internal hostnames are blocked
tools.web.fetch.ssrfPolicy.allowRfc2544BenchmarkRange and tools.web.fetch.ssrfPolicy.allowIpv6UniqueLocalRange are narrow opt-ins for trusted fake-IP proxy stacks; leave them unset unless your proxy owns those synthetic ranges and enforces its own destination policy
Redirects are checked and limited by maxRedirects
useTrustedEnvProxy is an explicit opt-in and should only be enabled for operator-controlled proxies that still enforce outbound policy after DNS resolution
web_fetch is best-effort -- some sites need the Web Browser

Tool profiles

If you use tool profiles or allowlists, add web_fetch or group:web:

json5

{  tools: {    allow: ["web_fetch"],    // or: allow: ["group:web"]  (includes web_fetch, web_search, and x_search)  },}

Web Search -- search the web with multiple providers
Web Browser -- full browser automation for JS-heavy sites
Firecrawl -- Firecrawl search and scrape tools

Was this useful?

Quick start

Tool parameters

How it works

Fetch

Extract

Fallback (optional)

Cache

Config

Firecrawl fallback

Trusted env proxy

Limits and safety

Tool profiles

Related