Tools

Web fetch

Edit source

The web_fetch tool does a plain HTTP GET and extracts readable content (HTML to markdown or text). It does not execute JavaScript.

For JS-heavy sites or login-protected pages, use the Web Browser instead.

Quick start

web_fetch is enabled by default -- no configuration needed. The agent can call it immediately:

javascript
await web_fetch({ url: "https://example.com/article" });

Tool parameters

urlstringrequired

URL to fetch. http(s) only.

extractMode'markdown' | 'text'default: markdown

Output format after main-content extraction.

maxCharsnumber

Truncate output to this many characters.

How it works

  • Fetch

    Sends an HTTP GET with a Chrome-like User-Agent and Accept-Language header. Blocks private/internal hostnames and re-checks redirects.

  • Extract

    Runs Readability (main-content extraction) on the HTML response.

  • Fallback (optional)

    If Readability fails and Firecrawl is configured, retries through the Firecrawl API with bot-circumvention mode.

  • Cache

    Results are cached for 15 minutes (configurable) to reduce repeated fetches of the same URL.

  • Config

    json5
    {  tools: {    web: {      fetch: {        enabled: true, // default: true        provider: "firecrawl", // optional; omit for auto-detect        maxChars: 50000, // max output chars        maxCharsCap: 50000, // hard cap for maxChars param        maxResponseBytes: 2000000, // max download size before truncation        timeoutSeconds: 30,        cacheTtlMinutes: 15,        maxRedirects: 3,        useTrustedEnvProxy: false, // let a trusted HTTP(S) env proxy resolve DNS        readability: true, // use Readability extraction        userAgent: "Mozilla/5.0 ...", // override User-Agent        ssrfPolicy: {          allowRfc2544BenchmarkRange: true, // opt-in for trusted fake-IP proxies using 198.18.0.0/15          allowIpv6UniqueLocalRange: true, // opt-in for trusted fake-IP proxies using fc00::/7        },      },    },  },}

    Firecrawl fallback

    If Readability extraction fails, web_fetch can fall back to Firecrawl for bot-circumvention and better extraction:

    json5
    {  tools: {    web: {      fetch: {        provider: "firecrawl", // optional; omit for auto-detect from available credentials      },    },  },  plugins: {    entries: {      firecrawl: {        enabled: true,        config: {          webFetch: {            apiKey: "fc-...", // optional if FIRECRAWL_API_KEY is set            baseUrl: "https://api.firecrawl.dev",            onlyMainContent: true,            maxAgeMs: 86400000, // cache duration (1 day)            timeoutSeconds: 60,          },        },      },    },  },}

    plugins.entries.firecrawl.config.webFetch.apiKey supports SecretRef objects. Legacy tools.web.fetch.firecrawl.* config is auto-migrated by openclaw doctor --fix.

    Current runtime behavior:

    • tools.web.fetch.provider selects the fetch fallback provider explicitly.
    • If provider is omitted, OpenClaw auto-detects the first ready web-fetch provider from available credentials. Non-sandboxed web_fetch can use installed plugins that declare contracts.webFetchProviders and register a matching provider at runtime. Today the bundled provider is Firecrawl.
    • Sandboxed web_fetch calls stay limited to bundled providers.
    • If Readability is disabled, web_fetch skips straight to the selected provider fallback. If no provider is available, it fails closed.

    Trusted env proxy

    If your deployment requires web_fetch to go through a trusted outbound HTTP(S) proxy, set tools.web.fetch.useTrustedEnvProxy: true.

    In this mode, OpenClaw still applies hostname-based SSRF checks before sending the request, but it lets the proxy resolve DNS instead of doing local DNS pinning. Enable this only when the proxy is operator-controlled and enforces outbound policy after DNS resolution.

    Limits and safety

    • maxChars is clamped to tools.web.fetch.maxCharsCap
    • Response body is capped at maxResponseBytes before parsing; oversized responses are truncated with a warning
    • Private/internal hostnames are blocked
    • tools.web.fetch.ssrfPolicy.allowRfc2544BenchmarkRange and tools.web.fetch.ssrfPolicy.allowIpv6UniqueLocalRange are narrow opt-ins for trusted fake-IP proxy stacks; leave them unset unless your proxy owns those synthetic ranges and enforces its own destination policy
    • Redirects are checked and limited by maxRedirects
    • useTrustedEnvProxy is an explicit opt-in and should only be enabled for operator-controlled proxies that still enforce outbound policy after DNS resolution
    • web_fetch is best-effort -- some sites need the Web Browser

    Tool profiles

    If you use tool profiles or allowlists, add web_fetch or group:web:

    json5
    {  tools: {    allow: ["web_fetch"],    // or: allow: ["group:web"]  (includes web_fetch, web_search, and x_search)  },}
    • Web Search -- search the web with multiple providers
    • Web Browser -- full browser automation for JS-heavy sites
    • Firecrawl -- Firecrawl search and scrape tools
    Was this useful?