Using wptrunner in Chromium (experimental)

wptrunner is the harness shipped with the WPT project for running the test suite. This user guide documents experimental support in Chromium for wptrunner, which will replace run_web_tests.py for running WPTs in CQ/CI.

For general information on web platform tests, see web-platform-tests.org.

For technical details on the migration to wptrunner in Chromium, see the project plan.

Warning: The project is under active development, so expect some rough edges. This document may be stale.

Differences from run_web_tests.py

The main differences between run_web_tests.py and wptrunner are that:

  1. wptrunner can run both the full chrome binary and the stripped-down content_shell. run_web_tests.py can only run content_shell.
  2. wptrunner can communicate with the binary via WebDriver (chromedriver), instead of talking directly to the browser binary.

These differences mean that any feature that works on upstream WPT today (e.g. print-reftests) should work in wptrunner, but conversely, features available to run_web_tests.py (e.g. the internals API) are not yet available to wptrunner.

Running tests locally

The wptrunner wrapper script is //third_party/blink/tools/run_wpt_tests.py. First, build the necessary ninja target:

autoninja -C out/Release wpt_tests_isolate_content_shell

To run the script, run the command below from //third_party/blink/tools:

./run_wpt_tests.py [test list]

Test paths should be given relative to blink/web_tests/ (e.g., wpt_internal/badging/badge-success.https.html). For convenience, the external/wpt/ prefix can be omitted for the external test suite (e.g., webauthn/createcredential-timeout.https.html).

run_wpt_tests.py also accepts directories, which will run all tests under those directories. Omitting the test list will run all WPT tests (both internal and external). Results from the run are placed under //out/<target>/layout-test-results/.

Useful flags:

  • -t/--target: Select which //out/ subdirectory to use, e.g. -t Debug. Defaults to Release.
  • -p/--product: Select which browser (or browser component) to test. Defaults to content_shell, but choices also include chrome, chrome_android, and android_webview.
  • -v: Increase verbosity (may provide multiple times).
  • --help: Show the help text.

Experimental Builders

As of Q3 2022, wptrunner runs on a handful of experimental FYI CI builders (mostly Linux):

Each of these builders has an opt-in trybot mirror with the same name. To run one of these builders against a CL, click “Choose Tryjobs” in Gerrit, then search for the builder name. A Cq-Include-Trybots: footer in the CL description can add a wptrunner builder to the default CQ builder set. Results for the bots use the existing layout test results viewer.

Expectations

Similar to run_web_tests.py, wptrunner allows engineers to specify what results to expect and which tests to skip. This information is stored in WPT metadata files. Each metadata file is checked in with an .ini suffix appended to its corresponding test file's path:

external/wpt/folder/my-test.html
external/wpt/folder/my-test-expected.txt  <-- run_web_tests.py baseline
external/wpt/folder/my-test.html.ini      <-- wptrunner metadata

A metadata file is roughly equivalent to a run_web_tests.py baseline and the test's corresponding lines in web test expectation files. Metadata files record test and subtest expectations in a structured INI-like text format:

[my-test.html]
  expected: OK
  bug: crbug.com/123  # Comments start with '#'

  [First subtest name (flaky)]
    expected: [PASS, FAIL]  # Expect either a pass or a failure

  [Second subtest name: [\]]  # The backslash escapes a literal ']' in the subtest name
    expected: FAIL

The brackets [...] denote the start of a (sub)test section, which can be hierarchically nested with significant indentation. Each section can contain <key>: <value> pairs. Important keys that wptrunner understands:

  • expected: The statuses to expect.
    • Tests commonly have these harness statuses: OK, ERROR, TIMEOUT, or CRASH (for tests without subtests, like reftests, PASS replaces OK and FAIL replaces ERROR)
    • Subtests commonly have: PASS, FAIL, or TIMEOUT
    • For convenience, wptrunner expects OK or PASS when expected is omitted. Deleting the entire metadata file implies an all-PASS test.
  • disabled: Any nonempty value will disable the test or ignore the subtest result.
Note: As shown in the example above, a testharness.js test may have a test-level status of OK, even if some subtests FAIL. This is a common point of confusion: OK only means that the test ran to completion and did not CRASH or TIMEOUT. OK does not imply that every subtest PASSed.
Note: Currently, wptrunner can inherit expectations from TestExpectations files through a translation step. Due to lost subtest coverage, we are actively working to deprecate this and use checked-in metadata natively in Chromium.

Conditional Values

run_web_tests.py encodes platform- or flag-specific results using platform tags in test expectations, separate FlagExpectations/* files, and baseline fallback. WPT metadata uses a Python-like conditional syntax instead to store all expectations in one file:

[my-test.html]
  expected:
    if not debug: FAIL
    if os == "mac" or (os == "linux" and version != "trusty"): [FAIL, PASS]
    TIMEOUT  # If no branch matches, use this default value.

To evaluate a conditional value, wptrunner takes the right-hand side of the first branch where the condition evaluates to a truthy value. Conditions can contain arbitrary Python-like boolean expressions that will be evaluated against properties (i.e., variables) pulled from the test environment. Properties available in Chromium are shown below:

PropertyTypeDescriptionChoices
osstrOS familylinux, android
versionstrOS versionDepends on os
productstrBrowser or browser componentchrome, content_shell, chrome_android, android_webview
processorstrCPU specifierarm, x86, x86_64
flag_specificstrFlag-specific suite nameSee FlagSpecificConfig (will be falsy for the generic suite)
debugboolis_debug build?N/A

Test Parameterization

The WPT suite supports forms of test parameterization where a test file on disk may map to more than one test ID: multiglobal .any.js tests and test variants. The metadata for these parameterizations live in the same file (test file path with the .ini suffix), but under different top-level sections. For example, suppose a test external/wpt/a.any.js generates test IDs a.any.html?b, a.any.html?c, a.any.worker.html?b, and a.any.worker.html?c. Then, a file named external/wpt/a.any.js.ini stores expectations for all parameterizations:

[a.any.html?b]
  expected: OK

[a.any.html?c]
  expected: CRASH

[a.any.worker.html?b]
  expected: TIMEOUT

[a.any.worker.html?c]
  expected: TIMEOUT

Directory-Wide Expectations

To set expectations or disable tests under a directory without editing an .ini file for every test, place a file named __dir__.ini under the desired directory with contents like:

expected:
  if os == "linux": CRASH
disabled:
  if flag_specific == "highdpi": skip highdpi for these non-rendering tests

Note that there is no section heading [my-test.html], but the keys work exactly the same as for per-test metadata.

Metadata closer to affected test files take higher precedence. For example, expectations set by a/b/test.html.ini override those of a/b/__dir__.ini, which overrides a/__dir__.ini.

The special value disabled: @False can selectively reenable child tests or directories that would have been disabled by a parent __dir__.ini.

Tooling

To help update expectations in bulk, blink_tool.py has an update-metadata subcommand that can automatically update expectations from try job results (similar to rebaseline-cl). Example invocation:

./blink_tool.py update-metadata --verbose --bug=123 \
    --build=linux-wpt-content-shell-fyi-rel:30 css/

This will update the expected statuses for external/wpt/css/ (sub)tests that ran unexpectedly on build 30 of linux-wpt-content-shell-fyi-rel. Any updated test section will be annotated with bug: crbug.com/123.

Known issues

  • There is no debugging support in run_wpt_tests.py today. In the future, we intend to allow pausing the browser after each test, and (long-term) intend to support hooking up gdb to test runs.
  • There is not yet support for non-Linux platforms. We would love for you to try it on other operating systems and file bugs against us if it doesn't work!

Please file bugs and feature requests against Blink>Infra, tagging the title with [wptrunner].