Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 1 | # GPU Bot Details |
| 2 | |
Kenneth Russell | 9618adde | 2018-05-03 03:16:05 | [diff] [blame] | 3 | This page describes in detail how the GPU bots are set up, which files affect |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 4 | their configuration, and how to both modify their behavior and add new bots. |
| 5 | |
| 6 | [TOC] |
| 7 | |
| 8 | ## Overview of the GPU bots' setup |
| 9 | |
| 10 | Chromium's GPU bots, compared to the majority of the project's test machines, |
| 11 | are physical pieces of hardware. When end users run the Chrome browser, they |
| 12 | are almost surely running it on a physical piece of hardware with a real |
| 13 | graphics processor. There are some portions of the code base which simply can |
| 14 | not be exercised by running the browser in a virtual machine, or on a software |
| 15 | implementation of the underlying graphics libraries. The GPU bots were |
| 16 | developed and deployed in order to cover these code paths, and avoid |
| 17 | regressions that are otherwise inevitable in a project the size of the Chromium |
| 18 | browser. |
| 19 | |
| 20 | The GPU bots are utilized on the [chromium.gpu] and [chromium.gpu.fyi] |
| 21 | waterfalls, and various tryservers, as described in [Using the GPU Bots]. |
| 22 | |
Kenneth Russell | 9618adde | 2018-05-03 03:16:05 | [diff] [blame] | 23 | [chromium.gpu]: https://ci.chromium.org/p/chromium/g/chromium.gpu/console |
| 24 | [chromium.gpu.fyi]: https://ci.chromium.org/p/chromium/g/chromium.gpu.fyi/console |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 25 | [Using the GPU Bots]: gpu_testing.md#Using-the-GPU-Bots |
| 26 | |
Kenneth Russell | 9618adde | 2018-05-03 03:16:05 | [diff] [blame] | 27 | All of the physical hardware for the bots lives in the Swarming pool, and most |
| 28 | of it in the Chrome-GPU Swarming pool. The waterfall bots are simply virtual |
| 29 | machines which spawn Swarming tasks with the appropriate tags to get them to run |
| 30 | on the desired GPU and operating system type. So, for example, the [Win10 |
| 31 | Release (NVIDIA)] bot is actually a virtual machine which spawns all of its jobs |
| 32 | with the Swarming parameters: |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 33 | |
| 34 | [Win10 Release (NVIDIA)]: https://ci.chromium.org/buildbot/chromium.gpu/Win10%20Release%20%28NVIDIA%29/?limit=200 |
| 35 | |
| 36 | ```json |
| 37 | { |
| 38 | "gpu": "10de:1cb3-23.21.13.8816", |
| 39 | "os": "Windows-10", |
| 40 | "pool": "Chrome-GPU" |
| 41 | } |
| 42 | ``` |
| 43 | |
| 44 | Since the GPUs in the Swarming pool are mostly homogeneous, this is sufficient |
| 45 | to target the pool of Windows 10-like NVIDIA machines. (There are a few Windows |
| 46 | 7-like NVIDIA bots in the pool, which necessitates the OS specifier.) |
| 47 | |
| 48 | Details about the bots can be found on [chromium-swarm.appspot.com] and by |
| 49 | using `src/tools/swarming_client/swarming.py`, for example `swarming.py bots`. |
| 50 | If you are authenticated with @google.com credentials you will be able to make |
| 51 | queries of the bots and see, for example, which GPUs are available. |
| 52 | |
| 53 | [chromium-swarm.appspot.com]: https://chromium-swarm.appspot.com/ |
| 54 | |
| 55 | The waterfall bots run tests on a single GPU type in order to make it easier to |
| 56 | see regressions or flakiness that affect only a certain type of GPU. |
| 57 | |
| 58 | The tryservers like `win_chromium_rel_ng` which include GPU tests, on the other |
| 59 | hand, run tests on more than one GPU type. As of this writing, the Windows |
| 60 | tryservers ran tests on NVIDIA and AMD GPUs; the Mac tryservers ran tests on |
| 61 | Intel and NVIDIA GPUs. The way these tryservers' tests are specified is simply |
| 62 | by *mirroring* how one or more waterfall bots work. This is an inherent |
| 63 | property of the [`chromium_trybot` recipe][chromium_trybot.py], which was designed to eliminate |
| 64 | differences in behavior between the tryservers and waterfall bots. Since the |
| 65 | tryservers mirror waterfall bots, if the waterfall bot is working, the |
| 66 | tryserver must almost inherently be working as well. |
| 67 | |
| 68 | [chromium_trybot.py]: https://chromium.googlesource.com/chromium/tools/build/+/master/scripts/slave/recipes/chromium_trybot.py |
| 69 | |
| 70 | There are a few one-off GPU configurations on the waterfall where the tests are |
| 71 | run locally on physical hardware, rather than via Swarming. A few examples are: |
| 72 | |
| 73 | <!-- XXX: update this list --> |
| 74 | * [Mac Pro Release (AMD)](https://luci-milo.appspot.com/buildbot/chromium.gpu.fyi/Mac%20Pro%20Release%20%28AMD%29/) |
| 75 | * [Mac Pro Debug (AMD)](https://luci-milo.appspot.com/buildbot/chromium.gpu.fyi/Mac%20Pro%20Debug%20%28AMD%29/) |
| 76 | * [Linux Release (Intel HD 630)](https://luci-milo.appspot.com/buildbot/chromium.gpu.fyi/Linux%20Release%20%28Intel%20HD%20630%29/) |
| 77 | * [Linux Release (AMD R7 240)](https://luci-milo.appspot.com/buildbot/chromium.gpu.fyi/Linux%20Release%20%28AMD%20R7%20240%29/) |
| 78 | |
| 79 | There are a couple of reasons to continue to support running tests on a |
| 80 | specific machine: it might be too expensive to deploy the required multiple |
| 81 | copies of said hardware, or the configuration might not be reliable enough to |
| 82 | begin scaling it up. |
| 83 | |
| 84 | ## Adding a new isolated test to the bots |
| 85 | |
| 86 | Adding a new test step to the bots requires that the test run via an isolate. |
| 87 | Isolates describe both the binary and data dependencies of an executable, and |
| 88 | are the underpinning of how the Swarming system works. See the [LUCI wiki] for |
| 89 | background on Isolates and Swarming. |
| 90 | |
| 91 | <!-- XXX: broken link --> |
| 92 | [LUCI wiki]: https://github.com/luci/luci-py/wiki |
| 93 | |
| 94 | ### Adding a new isolate |
| 95 | |
| 96 | 1. Define your target using the `template("test")` template in |
| 97 | [`src/testing/test.gni`][testing/test.gni]. See `test("gl_tests")` in |
| 98 | [`src/gpu/BUILD.gn`][gpu/BUILD.gn] for an example. For a more complex |
| 99 | example which invokes a series of scripts which finally launches the |
| 100 | browser, see [`src/chrome/telemetry_gpu_test.isolate`][telemetry_gpu_test.isolate]. |
| 101 | 2. Add an entry to [`src/testing/buildbot/gn_isolate_map.pyl`][gn_isolate_map.pyl] that refers to |
| 102 | your target. Find a similar target to yours in order to determine the |
| 103 | `type`. The type is referenced in [`src/tools/mb/mb_config.pyl`][mb_config.pyl]. |
| 104 | |
| 105 | [testing/test.gni]: https://chromium.googlesource.com/chromium/src/+/master/testing/test.gni |
| 106 | [gpu/BUILD.gn]: https://chromium.googlesource.com/chromium/src/+/master/gpu/BUILD.gn |
| 107 | <!-- XXX: broken link --> |
| 108 | [telemetry_gpu_test.isolate]: https://chromium.googlesource.com/chromium/src/+/master/chrome/telemetry_gpu_test.isolate |
| 109 | [gn_isolate_map.pyl]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/gn_isolate_map.pyl |
| 110 | [mb_config.pyl]: https://chromium.googlesource.com/chromium/src/+/master/tools/mb/mb_config.pyl |
| 111 | |
| 112 | At this point you can build and upload your isolate to the isolate server. |
| 113 | |
| 114 | See [Isolated Testing for SWEs] for the most up-to-date instructions. These |
| 115 | instructions are a copy which show how to run an isolate that's been uploaded |
| 116 | to the isolate server on your local machine rather than on Swarming. |
| 117 | |
| 118 | [Isolated Testing for SWEs]: https://www.chromium.org/developers/testing/isolated-testing/for-swes |
| 119 | |
| 120 | If `cd`'d into `src/`: |
| 121 | |
| 122 | 1. `./tools/mb/mb.py isolate //out/Release [target name]` |
| 123 | * For example: `./tools/mb/mb.py isolate //out/Release angle_end2end_tests` |
| 124 | 1. `python tools/swarming_client/isolate.py batcharchive -I https://isolateserver.appspot.com out/Release/[target name].isolated.gen.json` |
| 125 | * For example: `python tools/swarming_client/isolate.py batcharchive -I https://isolateserver.appspot.com out/Release/angle_end2end_tests.isolated.gen.json` |
| 126 | 1. This will write a hash to stdout. You can run it via: |
| 127 | `python tools/swarming_client/run_isolated.py -I https://isolateserver.appspot.com -s [HASH] -- [any additional args for the isolate]` |
| 128 | |
| 129 | See the section below on [isolate server credentials](#Isolate-server-credentials). |
| 130 | |
| 131 | ### Adding your new isolate to the tests that are run on the bots |
| 132 | |
| 133 | See [Adding new steps to the GPU bots] for details on this process. |
| 134 | |
| 135 | [Adding new steps to the GPU bots]: gpu_testing.md#Adding-new-steps-to-the-GPU-Bots |
| 136 | |
| 137 | ## Relevant files that control the operation of the GPU bots |
| 138 | |
| 139 | In the [tools/build] workspace: |
| 140 | |
| 141 | * [masters/master.chromium.gpu] and [masters/master.chromium.gpu.fyi]: |
| 142 | * builders.pyl in these two directories defines the bots that show up on |
| 143 | the waterfall. If you are adding a new bot, you need to add it to |
| 144 | builders.pyl and use go/bug-a-trooper to request a restart of either |
| 145 | master.chromium.gpu or master.chromium.gpu.fyi. |
| 146 | * Only changes under masters/ require a waterfall restart. All other |
| 147 | changes – for example, to scripts/slave/ in this workspace, or the |
| 148 | Chromium workspace – do not require a master restart (and go live the |
| 149 | minute they are committed). |
| 150 | * `scripts/slave/recipe_modules/chromium_tests/`: |
| 151 | * <code>[chromium_gpu.py]</code> and |
| 152 | <code>[chromium_gpu_fyi.py]</code> define the following for |
| 153 | each builder and tester: |
| 154 | * How the workspace is checked out (e.g., this is where top-of-tree |
| 155 | ANGLE is specified) |
| 156 | * The build configuration (e.g., this is where 32-bit vs. 64-bit is |
| 157 | specified) |
| 158 | * Various gclient defines (like compiling in the hardware-accelerated |
| 159 | video codecs, and enabling compilation of certain tests, like the |
| 160 | dEQP tests, that can't be built on all of the Chromium builders) |
| 161 | * Note that the GN configuration of the bots is also controlled by |
| 162 | <code>[mb_config.pyl]</code> in the Chromium workspace; see below. |
| 163 | * <code>[trybots.py]</code> defines how try bots *mirror* one or more |
| 164 | waterfall bots. |
| 165 | * The concept of try bots mirroring waterfall bots ensures there are |
| 166 | no differences in behavior between the waterfall bots and the try |
| 167 | bots. This helps ensure that a CL will not pass the commit queue |
| 168 | and then break on the waterfall. |
| 169 | * This file defines the behavior of the following GPU-related try |
| 170 | bots: |
| 171 | * `linux_chromium_rel_ng`, `mac_chromium_rel_ng`, and |
| 172 | `win_chromium_rel_ng`, which run against every Chromium CL, and |
| 173 | which mirror the behavior of bots on the chromium.gpu |
| 174 | waterfall. |
| 175 | * The ANGLE try bots, which run against ANGLE CLs, and mirror the |
| 176 | behavior of the chromium.gpu.fyi waterfall (including using |
| 177 | top-of-tree ANGLE, and running additional tests not run by the |
| 178 | regular Chromium try bots) |
| 179 | * The optional GPU try servers `linux_optional_gpu_tests_rel`, |
| 180 | `mac_optional_gpu_tests_rel` and |
| 181 | `win_optional_gpu_tests_rel`, which are triggered manually and |
| 182 | run some tests which can't be run on the regular Chromium try |
| 183 | servers mainly due to lack of hardware capacity. |
| 184 | |
| 185 | [tools/build]: https://chromium.googlesource.com/chromium/tools/build/ |
| 186 | [masters/master.chromium.gpu]: https://chromium.googlesource.com/chromium/tools/build/+/master/masters/master.chromium.gpu/ |
| 187 | [masters/master.chromium.gpu.fyi]: https://chromium.googlesource.com/chromium/tools/build/+/master/masters/master.chromium.gpu.fyi/ |
| 188 | [chromium_gpu.py]: https://chromium.googlesource.com/chromium/tools/build/+/master/scripts/slave/recipe_modules/chromium_tests/chromium_gpu.py |
| 189 | [chromium_gpu_fyi.py]: https://chromium.googlesource.com/chromium/tools/build/+/master/scripts/slave/recipe_modules/chromium_tests/chromium_gpu_fyi.py |
| 190 | [trybots.py]: https://chromium.googlesource.com/chromium/tools/build/+/master/scripts/slave/recipe_modules/chromium_tests/trybots.py |
| 191 | |
| 192 | In the [chromium/src] workspace: |
| 193 | |
| 194 | * [src/testing/buildbot]: |
| 195 | * <code>[chromium.gpu.json]</code> and |
| 196 | <code>[chromium.gpu.fyi.json]</code> define which steps are run on |
| 197 | which bots. These files are autogenerated. Don't modify them directly! |
| 198 | * <code>[gn_isolate_map.pyl]</code> defines all of the isolates' behavior in the GN |
| 199 | build. |
| 200 | * [`src/tools/mb/mb_config.pyl`][mb_config.pyl] |
| 201 | * Defines the GN arguments for all of the bots. |
Kenneth Russell | 8a386d4 | 2018-06-02 09:48:01 | [diff] [blame] | 202 | * [`src/testing/buildbot/generate_buildbot_json.py`][generate_buildbot_json.py] |
| 203 | * The generator script for all the waterfalls, including `chromium.gpu.json` and |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 204 | `chromium.gpu.fyi.json`. It defines on which GPUs various tests run. |
Kenneth Russell | 8a386d4 | 2018-06-02 09:48:01 | [diff] [blame] | 205 | * See the [README for generate_buildbot_json.py] for documentation |
| 206 | on this script and the descriptions of the waterfalls and test suites. |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 207 | * When modifying this script, don't forget to also run it, to regenerate |
Kenneth Russell | 8a386d4 | 2018-06-02 09:48:01 | [diff] [blame] | 208 | the JSON files. Don't worry; the presubmit step will catch this if you forget. |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 209 | * See [Adding new steps to the GPU bots] for more details. |
| 210 | |
| 211 | [chromium/src]: https://chromium.googlesource.com/chromium/src/ |
| 212 | [src/testing/buildbot]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot |
| 213 | [chromium.gpu.json]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/chromium.gpu.json |
| 214 | [chromium.gpu.fyi.json]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/chromium.gpu.fyi.json |
| 215 | [gn_isolate_map.pyl]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/gn_isolate_map.pyl |
| 216 | [mb_config.pyl]: https://chromium.googlesource.com/chromium/src/+/master/tools/mb/mb_config.pyl |
Kenneth Russell | 8a386d4 | 2018-06-02 09:48:01 | [diff] [blame] | 217 | [generate_buildbot_json.py]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/generate_buildbot_json.py |
| 218 | [waterfalls.pyl]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/waterfalls.pyl |
| 219 | [README for generate_buildbot_json.py]: ../../testing/buildbot/README.md |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 220 | |
| 221 | In the [infradata/config] workspace (Google internal only, sorry): |
| 222 | |
| 223 | * [configs/chromium-swarm/bots.cfg] |
| 224 | * Defines a `Chrome-GPU` Swarming pool which contains most of the |
| 225 | specialized hardware: as of this writing, the Windows and Linux NVIDIA |
| 226 | bots, the Windows AMD bots, and the MacBook Pros with NVIDIA and AMD |
| 227 | GPUs. New GPU hardware should be added to this pool. |
| 228 | |
| 229 | [infradata/config]: https://chrome-internal.googlesource.com/infradata/config |
| 230 | [configs/chromium-swarm/bots.cfg]: https://chrome-internal.googlesource.com/infradata/config/+/master/configs/chromium-swarm/bots.cfg |
| 231 | |
| 232 | ## Walkthroughs of various maintenance scenarios |
| 233 | |
| 234 | This section describes various common scenarios that might arise when |
| 235 | maintaining the GPU bots, and how they'd be addressed. |
| 236 | |
| 237 | ### How to add a new test or an entire new step to the bots |
| 238 | |
| 239 | This is described in [Adding new tests to the GPU bots]. |
| 240 | |
| 241 | [Adding new tests to the GPU bots]: https://www.chromium.org/developers/testing/gpu-testing/#TOC-Adding-New-Tests-to-the-GPU-Bots |
| 242 | |
Kenneth Russell | 3a8e5c02 | 2018-05-04 21:14:49 | [diff] [blame] | 243 | ### How to add a new tester bot to the chromium.gpu.fyi waterfall |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 244 | |
| 245 | When deploying a new GPU configuration, it should be added to the |
| 246 | chromium.gpu.fyi waterfall first. The chromium.gpu waterfall should be reserved |
| 247 | for those GPUs which are tested on the commit queue. (Some of the bots violate |
| 248 | this rule – namely, the Debug bots – though we should strive to eliminate these |
| 249 | differences.) Once the new configuration is ready to be fully deployed on |
| 250 | tryservers, bots can be added to the chromium.gpu waterfall, and the tryservers |
| 251 | changed to mirror them. |
| 252 | |
| 253 | In order to add Release and Debug waterfall bots for a new configuration, |
| 254 | experience has shown that at least 4 physical machines are needed in the |
| 255 | swarming pool. The reason is that the tests all run in parallel on the Swarming |
| 256 | cluster, so the load induced on the swarming bots is higher than it would be |
Kenneth Russell | 9618adde | 2018-05-03 03:16:05 | [diff] [blame] | 257 | if the tests were run strictly serially. |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 258 | |
Kenneth Russell | 9618adde | 2018-05-03 03:16:05 | [diff] [blame] | 259 | With these prerequisites, these are the steps to add a new (swarmed) tester bot. |
| 260 | (Actually, pair of bots -- Release and Debug. If deploying just one or the |
| 261 | other, ignore the other configuration.) These instructions assume that you are |
| 262 | reusing one of the existing builders, like [`GPU FYI Win Builder`][GPU FYI Win |
| 263 | Builder]. |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 264 | |
| 265 | 1. Work with the Chrome Infrastructure Labs team to get the (minimum 4) |
| 266 | physical machines added to the Swarming pool. Use |
| 267 | [chromium-swarm.appspot.com] or `src/tools/swarming_client/swarming.py bots` |
| 268 | to determine the PCI IDs of the GPUs in the bots. (These instructions will |
| 269 | need to be updated for Android bots which don't have PCI buses.) |
Kenneth Russell | 9618adde | 2018-05-03 03:16:05 | [diff] [blame] | 270 | |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 271 | 1. Make sure to add these new machines to the Chrome-GPU Swarming pool by |
| 272 | creating a CL against [`configs/chromium-swarm/bots.cfg`][bots.cfg] in |
Kenneth Russell | 3a8e5c02 | 2018-05-04 21:14:49 | [diff] [blame] | 273 | the [infradata/config] (Google internal) workspace. Git configure your |
| 274 | user.email to @google.com if necessary. Here is an [example |
Kenneth Russell | 9618adde | 2018-05-03 03:16:05 | [diff] [blame] | 275 | CL](https://chrome-internal-review.googlesource.com/524420). |
| 276 | |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 277 | 1. File a Chrome Infrastructure Labs ticket requesting 2 virtual machines for |
| 278 | the testers. These need to match the OS of the physical machines and |
Kenneth Russell | 3a8e5c02 | 2018-05-04 21:14:49 | [diff] [blame] | 279 | builders. For example, if you're adding a "Windows 7 CoolNewGPUType" tester, |
| 280 | you'll need 2 Windows VMs. See this [example |
| 281 | ticket](http://crbug.com/838975). |
Kenneth Russell | 9618adde | 2018-05-03 03:16:05 | [diff] [blame] | 282 | |
| 283 | 1. Once the VMs are ready, create a CL in the |
| 284 | [`infradata/config`][infradata/config] (Google internal) workspace which |
Kenneth Russell | 3a8e5c02 | 2018-05-04 21:14:49 | [diff] [blame] | 285 | does the following. Git configure your user.email to @google.com if |
| 286 | necessary. Here's an [example |
Kenneth Russell | 9618adde | 2018-05-03 03:16:05 | [diff] [blame] | 287 | CL](https://chrome-internal-review.googlesource.com/619497). |
| 288 | 1. Adds two new "bot_group" blocks in the Chromium GPU FYI section of |
| 289 | [`configs/chromium-swarm/bots.cfg`][bots.cfg], one for the Release bot |
| 290 | and one for the Debug bot. Copy the closest configuration you can find |
| 291 | -- for example, Windows, Android, etc. |
| 292 | 1. Get this reviewed and landed. This step associates the VM with the bot's |
| 293 | name on the waterfall. |
| 294 | |
Kenneth Russell | 9618adde | 2018-05-03 03:16:05 | [diff] [blame] | 295 | 1. Create a CL in the Chromium workspace which does the following. Here's an |
| 296 | [example CL](https://chromium-review.googlesource.com/1041164). |
Kenneth Russell | 8a386d4 | 2018-06-02 09:48:01 | [diff] [blame] | 297 | 1. Adds the new machines to [waterfalls.pyl]. |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 298 | 1. The swarming dimensions are crucial. These must match the GPU and |
| 299 | OS type of the physical hardware in the Swarming pool. This is what |
| 300 | causes the VMs to spawn their tests on the correct hardware. Make |
| 301 | sure to use the Chrome-GPU pool, and that the new machines were |
| 302 | specifically added to that pool. |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 303 | 1. Make triply sure that there are no collisions between the new |
| 304 | hardware you're adding and hardware already in the Swarming pool. |
| 305 | For example, it used to be the case that all of the Windows NVIDIA |
| 306 | bots ran the same OS version. Later, the Windows 8 flavor bots were |
| 307 | added. In order to avoid accidentally running tests on Windows 8 |
| 308 | when Windows 7 was intended, the OS in the swarming dimensions of |
| 309 | the Win7 bots had to be changed from `win` to |
| 310 | `Windows-2008ServerR2-SP1` (the Win7-like flavor running in our |
| 311 | data center). Similarly, the Win8 bots had to have a very precise |
| 312 | OS description (`Windows-2012ServerR2-SP0`). |
Kenneth Russell | 9618adde | 2018-05-03 03:16:05 | [diff] [blame] | 313 | 1. If you're deploying a new bot that's similar to another existing |
Kenneth Russell | 8a386d4 | 2018-06-02 09:48:01 | [diff] [blame] | 314 | configuration, please search around in |
| 315 | `src/testing/buildbot/test_suite_exceptions.pyl` for references to |
Kenneth Russell | 9618adde | 2018-05-03 03:16:05 | [diff] [blame] | 316 | the other bot's name and see if your new bot needs to be added to |
| 317 | any exclusion lists. For example, some of the tests don't run on |
| 318 | certain Win bots because of missing OpenGL extensions. |
Kenneth Russell | 8a386d4 | 2018-06-02 09:48:01 | [diff] [blame] | 319 | 1. Run [generate_buildbot_json.py] to regenerate |
Kenneth Russell | 9618adde | 2018-05-03 03:16:05 | [diff] [blame] | 320 | `src/testing/buildbot/chromium.gpu.fyi.json`. |
| 321 | 1. Updates [`cr-buildbucket.cfg`][cr-buildbucket.cfg]: |
| 322 | * Add the two new machines (Release and Debug) inside the |
| 323 | luci.chromium.ci bucket. This sets up storage for the builds in the |
| 324 | system. Use the appropriate mixin; for example, "win-gpu-fyi-ci" has |
| 325 | already been set up for Windows GPU FYI bots on the waterfall. |
| 326 | 1. Updates [`luci-scheduler.cfg`][luci-scheduler.cfg]: |
| 327 | * Add new "job" blocks for your new Release and Debug test bots. They |
| 328 | should go underneath the builder which triggers them (like "GPU Win |
| 329 | FYI Builder"), in alphabetical order. Make sure the "id" and |
| 330 | "builer" entries match. This job block should use the acl_sets |
| 331 | "triggered-by-parent-builders", because it's triggered by the |
| 332 | builder, and not by changes to the git repository. |
| 333 | 1. Updates [`luci-milo.cfg`][luci-milo.cfg]: |
| 334 | * Add new "builders" blocks for your new testers (Release and Debug) |
| 335 | on the [`chromium.gpu.fyi`][chromium.gpu.fyi] console. Look at the |
| 336 | short names and categories and try to come up with a reasonable |
| 337 | organization. |
| 338 | 1. If you were adding a new builder, you would need to also add the new |
| 339 | machine to [`src/tools/mb/mb_config.pyl`][mb_config.pyl]. |
Kenneth Russell | 139881b | 2018-05-04 00:45:20 | [diff] [blame] | 340 | |
| 341 | 1. After the Chromium-side CL lands it will take some time for all of |
| 342 | the configuration changes to be picked up by the system. The bot |
Kenneth Russell | 4d1bb448 | 2018-05-09 23:36:37 | [diff] [blame] | 343 | will probably be in a red or purple state, claiming that it can't |
| 344 | find its configuration. (It might also be in an "empty" state, not |
| 345 | running any jobs at all.) |
Kenneth Russell | 139881b | 2018-05-04 00:45:20 | [diff] [blame] | 346 | |
Kenneth Russell | 4d1bb448 | 2018-05-09 23:36:37 | [diff] [blame] | 347 | 1. *After* the Chromium-side CL lands and the bot is on the console, create a CL |
| 348 | in the [`tools/build`][tools/build] workspace which does the |
Kenneth Russell | 139881b | 2018-05-04 00:45:20 | [diff] [blame] | 349 | following. Here's an [example |
| 350 | CL](https://chromium-review.googlesource.com/1041145). |
| 351 | 1. Adds the new VMs to [`chromium_gpu_fyi.py`][chromium_gpu_fyi.py] in |
| 352 | `scripts/slave/recipe_modules/chromium_tests/`. Make sure to set the |
| 353 | `serialize_tests` property to `True`. This is specified for waterfall |
| 354 | bots, but not trybots, and helps avoid overloading the physical |
| 355 | hardware. Double-check the `BUILD_CONFIG` and `parent_buildername` |
| 356 | properties for each. They must match the Release/Debug flavor of the |
| 357 | builder, like `GPU FYI Win Builder` vs. `GPU FYI Win Builder (dbg)`. |
| 358 | 1. Get this reviewed and landed. This step tells the Chromium recipe about |
| 359 | the newly-deployed waterfall bot, so it knows which JSON file to load |
| 360 | out of src/testing/buildbot and which entry to look at. |
| 361 | 1. It used to be necessary to retrain recipe expectations |
| 362 | (`scripts/slave/recipes.py --use-bootstrap test train`). This doesn't |
| 363 | appear to be necessary any more, but it's something to watch out for if |
| 364 | your CL fails presubmit for some reason. |
| 365 | |
Kenneth Russell | 4d1bb448 | 2018-05-09 23:36:37 | [diff] [blame] | 366 | 1. Note that it is crucial that the bot be deployed before hooking it up in the |
| 367 | tools/build workspace. In the new LUCI world, if the parent builder can't |
| 368 | find its child testers to trigger, that's a hard error on the parent. This |
| 369 | will cause the builders to fail. You can and should prepare the tools/build |
| 370 | CL in advance, but make sure it doesn't land until the bot's on the console. |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 371 | |
Kenneth Russell | 9618adde | 2018-05-03 03:16:05 | [diff] [blame] | 372 | [bots.cfg]: https://chrome-internal.googlesource.com/infradata/config/+/master/configs/chromium-swarm/bots.cfg |
| 373 | [infradata/config]: https://chrome-internal.googlesource.com/infradata/config/ |
| 374 | [cr-buildbucket.cfg]: https://chromium.googlesource.com/chromium/src/+/master/infra/config/global/cr-buildbucket.cfg |
| 375 | [luci-milo.cfg]: https://chromium.googlesource.com/chromium/src/+/master/infra/config/global/luci-milo.cfg |
| 376 | [luci-scheduler.cfg]: https://chromium.googlesource.com/chromium/src/+/master/infra/config/global/luci-scheduler.cfg |
| 377 | [GPU FYI Win Builder]: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/GPU%20FYI%20Win%20Builder |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 378 | |
Kenneth Russell | 3a8e5c02 | 2018-05-04 21:14:49 | [diff] [blame] | 379 | ### How to start running tests on a new GPU type on an existing try bot |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 380 | |
| 381 | Let's say that you want to cause the `win_chromium_rel_ng` try bot to run tests |
| 382 | on CoolNewGPUType in addition to the types it currently runs (as of this |
| 383 | writing, NVIDIA and AMD). To do this: |
| 384 | |
| 385 | 1. Make sure there is enough hardware capacity. Unfortunately, tools to report |
| 386 | utilization of the Swarming pool are still being developed, but a |
| 387 | back-of-the-envelope estimate is that you will need a minimum of 30 |
| 388 | machines in the Swarming pool to run the current set of GPU tests on the |
| 389 | tryservers. We estimate that 90 machines will be needed in order to |
| 390 | additionally run the WebGL 2.0 conformance tests. Plan for the larger |
| 391 | capacity, as it's desired to run the larger test suite on as many |
| 392 | configurations as possible. |
| 393 | 2. Deploy Release and Debug testers on the chromium.gpu waterfall, following |
| 394 | the instructions for the chromium.gpu.fyi waterfall above. You will also |
| 395 | need to temporarily add suppressions to |
| 396 | [`tests/masters_recipes_test.py`][tests/masters_recipes_test.py] for these |
| 397 | new testers since they aren't yet covered by try bots and are going on a |
| 398 | non-FYI waterfall. Make sure these run green for a day or two before |
| 399 | proceeding. |
| 400 | 3. Create a CL in the tools/build workspace, adding the new Release tester |
| 401 | to `win_chromium_rel_ng`'s `bot_ids` list |
| 402 | in `scripts/slave/recipe_modules/chromium_tests/trybots.py`. Rerun |
| 403 | `scripts/slave/recipes.py --use-bootstrap test train`. |
| 404 | 4. Once the CL in (3) lands, the commit queue will **immediately** start |
| 405 | running tests on the CoolNewGPUType configuration. Be vigilant and make |
| 406 | sure that tryjobs are green. If they are red for any reason, revert the CL |
| 407 | and figure out offline what went wrong. |
| 408 | |
| 409 | [tests/masters_recipes_test.py]: https://chromium.googlesource.com/chromium/tools/build/+/master/tests/masters_recipes_test.py |
| 410 | |
Kenneth Russell | 3a8e5c02 | 2018-05-04 21:14:49 | [diff] [blame] | 411 | ### How to add a new manually-triggered trybot |
| 412 | |
| 413 | There are a lot of one-off GPU types on the chromium.gpu.fyi waterfall and |
| 414 | sometimes a failure happens just on one type. It's helpful to just be able to |
| 415 | send a tryjob to a particular machine. Doing so requires a specific trybot to be |
| 416 | set up because most if not all of the existing trybots trigger tests on more |
| 417 | than one type of GPU. |
| 418 | |
| 419 | Here are the steps to set up a new trybot which runs tests just on one |
| 420 | particular GPU type. Let's consider that we are adding a manually-triggered |
| 421 | trybot for the Win7 NVIDIA GPUs in Release mode. We will call the new bot |
| 422 | `gpu_manual_try_win7_nvidia_rel`. |
| 423 | |
| 424 | 1. File a Chrome Infrastructure Labs ticket requesting ~3 virtual |
| 425 | machines. These will do builds and trigger jobs on the physical hardware, |
| 426 | and need to match the OS of the physical machines. See this [example |
| 427 | ticket](http://crbug.com/839216). |
| 428 | |
| 429 | 1. Once the VMs are ready, create a CL in the |
| 430 | [`infradata/config`][infradata/config] (Google internal) workspace which |
| 431 | does the following. Git configure your user.email to @google.com if |
| 432 | necessary. Here's an [example |
| 433 | CL](https://chrome-internal-review.googlesource.com/620773). |
| 434 | 1. Adds a new "bot_group" block in the "manually-triggered GPU trybots" |
| 435 | section of [`configs/chromium-swarm/bots.cfg`][bots.cfg]. Look in the |
| 436 | optional GPU tryserver section for the closest configuration you can |
| 437 | find to copy from -- for example, Windows, Android, |
| 438 | etc. (win_optional_gpu_tests_rel, android_optional_gpu_tests_rel). The |
| 439 | "dimensions" tag contains the name of the trybot, |
| 440 | e.g. "builder:gpu_manual_try_win7_nvidia_rel". |
| 441 | 1. Get this reviewed and landed. This step makes these machines the ones |
| 442 | which perform the builds for this new trybot. |
| 443 | |
| 444 | 1. Create a CL in the Chromium workspace which does the following. Here's an |
| 445 | [example CL](https://chromium-review.googlesource.com/1044767). |
| 446 | 1. Updates [`cr-buildbucket.cfg`][cr-buildbucket.cfg]: |
| 447 | * Add the new trybot to the `luci.chromium.try` bucket. This is a |
| 448 | one-liner, with "name" being "gpu_manual_try_win7_nvidia_rel" and |
| 449 | "mixins" being the OS-appropriate mixin, in this case |
| 450 | "win-optional-gpu-try". (We're repurposing the existing ACLs for the |
| 451 | "optional" GPU trybots for these manually-triggered ones.) |
| 452 | 1. Updates [`luci-milo.cfg`][luci-milo.cfg]: |
| 453 | * Add "builders" blocks for the new trybot to the `luci.chromium.try` and |
| 454 | `tryserver.chromium.win` consoles. |
| 455 | 1. Adds the new trybot to |
| 456 | [`src/tools/mb/mb_config.pyl`][mb_config.pyl]. Reuse the same mixin as |
| 457 | for the optional GPU trybot; in this case, |
| 458 | `gpu_fyi_tests_release_trybot_x86`. |
| 459 | 1. Get this CL reviewed and landed. |
| 460 | |
| 461 | 1. Create a CL in the [`tools/build`][tools/build] workspace which does the |
| 462 | following. Here's an [example |
| 463 | CL](https://chromium-review.googlesource.com/1044761). |
| 464 | |
| 465 | 1. Adds the new trybot to a "Manually-triggered GPU trybots" section in |
| 466 | `scripts/slave/recipe_modules/chromium_tests/trybots.py`. Create this |
| 467 | section after the "Optional GPU bots" section for the appropriate |
| 468 | tryserver (`tryserver.chromium.win`, `tryserver.chromium.mac`, |
| 469 | `tryserver.chromium.linux`, `tryserver.chromium.android`). Have the bot |
| 470 | mirror the appropriate waterfall bot; in this case, the buildername to |
| 471 | mirror is `GPU FYI Win Builder` and the tester is `Win7 FYI Release |
| 472 | (NVIDIA)`. |
| 473 | 1. Adds an exception for your new trybot in `tests/masters_recipes_test.py`, |
| 474 | under `FAKE_BUILDERS`, under the appropriate tryserver waterfall (in |
| 475 | this case, `master.tryserver.chromium.win`). This is because this is a |
| 476 | LUCI-only bot, and this test verifies the old buildbot configurations. |
| 477 | 1. Get this reviewed and landed. This step tells the Chromium recipe about |
| 478 | the newly-deployed trybot, so it knows which JSON file to load out of |
| 479 | src/testing/buildbot and which entry to look at to understand which |
| 480 | tests to run and on what physical hardware. |
| 481 | 1. It used to be necessary to retrain recipe expectations |
| 482 | (`scripts/slave/recipes.py --use-bootstrap test train`). This doesn't |
| 483 | appear to be necessary any more, but it's something to watch out for if |
| 484 | your CL fails presubmit for some reason. |
| 485 | |
Kenneth Russell | fc56614 | 2018-06-26 22:34:15 | [diff] [blame] | 486 | At this point the new trybot should automatically show up in the |
| 487 | "Choose tryjobs" pop-up in the Gerrit UI, under the |
| 488 | `luci.chromium.try` heading, because it was deployed via LUCI. It |
| 489 | should be possible to send a CL to it. |
Kenneth Russell | 3a8e5c02 | 2018-05-04 21:14:49 | [diff] [blame] | 490 | |
Kenneth Russell | fc56614 | 2018-06-26 22:34:15 | [diff] [blame] | 491 | (It should not be necessary to modify buildbucket.config as is |
| 492 | mentioned at the bottom of the "Choose tryjobs" pop-up. Contact the |
| 493 | chrome-infra team if this doesn't work as expected.) |
Kenneth Russell | 3a8e5c02 | 2018-05-04 21:14:49 | [diff] [blame] | 494 | |
| 495 | [chromium/src]: https://chromium-review.googlesource.com/q/project:chromium%252Fsrc+status:open |
| 496 | [go/chromecals]: http://go/chromecals |
| 497 | |
| 498 | |
| 499 | ### How to add a new "optional" try bot |
| 500 | |
| 501 | TODO(kbr): the naming of the "optional" try bots is confusing and |
| 502 | unfortunate. They should probably be renamed to something like "extratests" or |
| 503 | "extra_tests", so perhaps a new naming convention of "gpu_win_extratests_rel" or |
| 504 | "win_gpu_extratests_rel". Unfortunately making this change at this point |
| 505 | requires touching tons of files across many workspaces and is unlikely to happen |
| 506 | unless someone highly motivated wants to pick up the task. |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 507 | |
| 508 | The "optional" GPU try bots are a concession to the reality that there are some |
| 509 | long-running GPU test suites that simply can not run against every Chromium CL. |
| 510 | They run some additional tests that are usually run only on the |
| 511 | chromium.gpu.fyi waterfall. Some of these tests, like the WebGL 2.0 conformance |
| 512 | suite, are intended to be run on the normal try bots once hardware capacity is |
| 513 | available. Some are not intended to ever run on the normal try bots. |
| 514 | |
| 515 | The optional try bots are a little different because they mirror waterfall bots |
| 516 | that don't actually exist. The waterfall bots' specifications exist only to |
| 517 | tell the optional try bots which tests to run. |
| 518 | |
| 519 | Let's say that you intended to add a new such optional try bot on Windows. Call |
| 520 | it `win_new_optional_tests_rel` for example. Now, if you wanted to just add |
| 521 | this GPU type to the existing `win_optional_gpu_tests_rel` try bot, you'd |
| 522 | just follow the instructions above |
| 523 | ([How to start running tests on a new GPU type on an existing try bot](#How-to-start-running-tests-on-a-new-GPU-type-on-an-existing-try-bot)). The steps below describe how to spin up |
| 524 | an entire new optional try bot. |
| 525 | |
| 526 | 1. Make sure that you have some swarming capacity for the new GPU type. Since |
| 527 | it's not running against all Chromium CLs you don't need the recommended 30 |
| 528 | minimum bots, though ~10 would be good. |
| 529 | 1. Create a CL in the Chromium workspace: |
| 530 | 1. Add your new bot (for example, "Optional Win7 Release |
| 531 | (CoolNewGPUType)") to the chromium.gpu.fyi waterfall in |
Kenneth Russell | 8a386d4 | 2018-06-02 09:48:01 | [diff] [blame] | 532 | [waterfalls.pyl]. (Note, this is a bad example: the |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 533 | "optional" bots have special semantics in this script. You'd probably |
| 534 | want to define some new category of bot if you didn't intend to add |
| 535 | this to `win_optional_gpu_tests_rel`.) |
| 536 | 1. Re-run the script to regenerate the JSON files. |
| 537 | 1. Land the above CL. |
| 538 | 1. Create a CL in the tools/build workspace: |
| 539 | 1. Modify `masters/master.tryserver.chromium.win`'s [master.cfg] and |
| 540 | [slaves.cfg] to add the new tryserver. Follow the pattern for the |
| 541 | existing `win_optional_gpu_tests_rel` tryserver. Namely, add the new |
| 542 | entry to master.cfg, and add the new tryserver to the |
| 543 | `optional_builders` list in `slaves.cfg`. |
| 544 | 1. Modify [`chromium_gpu_fyi.py`][chromium_gpu_fyi.py] to add the new |
| 545 | "Optional Win7 Release (CoolNewGPUType)" entry. |
| 546 | 1. Modify [`trybots.py`][trybots.py] to add |
| 547 | the new `win_new_optional_tests_rel` try bot, mirroring "Optional |
| 548 | Win7 Release (CoolNewGPUType)". |
| 549 | 1. Land the above CL and request an off-hours restart of the |
| 550 | tryserver.chromium.win waterfall. |
| 551 | 1. Now you can send CLs to the new bot with: |
| 552 | `git cl try -m tryserver.chromium.win -b win_new_optional_tests_rel` |
| 553 | |
| 554 | [master.cfg]: https://chromium.googlesource.com/chromium/tools/build/+/master/masters/master.tryserver.chromium.win/master.cfg |
| 555 | [slaves.cfg]: https://chromium.googlesource.com/chromium/tools/build/+/master/masters/master.tryserver.chromium.win/slaves.cfg |
| 556 | |
Kenneth Russell | 3a8e5c02 | 2018-05-04 21:14:49 | [diff] [blame] | 557 | ### How to test and deploy a driver update |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 558 | |
| 559 | Let's say that you want to roll out an update to the graphics drivers on one of |
| 560 | the configurations like the Win7 NVIDIA bots. The responsible way to do this is |
| 561 | to run the new driver on one of the waterfalls for a day or two to make sure |
| 562 | the tests are running reliably green before rolling out the driver update |
| 563 | everywhere. To do this: |
| 564 | |
Kenneth Russell | 9618adde | 2018-05-03 03:16:05 | [diff] [blame] | 565 | 1. Make sure that all of the current Swarming jobs for this OS and GPU |
| 566 | configuration are targeted at the "stable" version of the driver in |
Kenneth Russell | 8a386d4 | 2018-06-02 09:48:01 | [diff] [blame] | 567 | [waterfalls.pyl]. |
Kenneth Russell | 9618adde | 2018-05-03 03:16:05 | [diff] [blame] | 568 | 1. File a `Build Infrastructure` bug, component `Infra>Labs`, to have ~4 of the |
| 569 | physical machines already in the Swarming pool upgraded to the new version |
| 570 | of the driver. |
| 571 | 1. If an "experimental" version of this bot doesn't yet exist, follow the |
| 572 | instructions above for [How to add a new tester bot to the chromium.gpu.fyi |
| 573 | waterfall](#How-to-add-a-new-tester-bot-to-the-chromium_gpu_fyi-waterfall) |
| 574 | to deploy one. |
| 575 | 1. Have this experimental bot target the new version of the driver in |
Kenneth Russell | 8a386d4 | 2018-06-02 09:48:01 | [diff] [blame] | 576 | [waterfalls.pyl]. |
Kenneth Russell | 9618adde | 2018-05-03 03:16:05 | [diff] [blame] | 577 | 1. Hopefully, the new machine will pass the pixel tests. If it doesn't, then |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 578 | unfortunately, it'll be necessary to follow the instructions on |
| 579 | [updating the pixel tests] to temporarily suppress the failures on this |
| 580 | particular configuration. Keep the time window for these test suppressions |
| 581 | as narrow as possible. |
Kenneth Russell | 9618adde | 2018-05-03 03:16:05 | [diff] [blame] | 582 | 1. Watch the new machine for a day or two to make sure it's stable. |
Kenneth Russell | 8a386d4 | 2018-06-02 09:48:01 | [diff] [blame] | 583 | 1. When it is, update [waterfalls.pyl] to use the |
Kenneth Russell | 9618adde | 2018-05-03 03:16:05 | [diff] [blame] | 584 | "gpu trigger script" functionality to select *either* the stable *or* the |
| 585 | new driver version on the stable version of the bot. See [this |
Zhenyao Mo | 7dede20 | 2018-09-18 17:57:21 | [diff] [blame] | 586 | CL](https://chromium-review.googlesource.com/1189059) for an example, though |
Kenneth Russell | 9618adde | 2018-05-03 03:16:05 | [diff] [blame] | 587 | that CL was targeting a different OS version rather than driver version. |
| 588 | 1. After that lands, ask the Chrome Infrastructure Labs team to roll out the |
| 589 | driver update across all of the similarly configured bots in the swarming |
| 590 | pool. |
| 591 | 1. If necessary, update pixel test expectations and remove the suppressions |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 592 | added above. |
Kenneth Russell | 9618adde | 2018-05-03 03:16:05 | [diff] [blame] | 593 | 1. Remove the alternate swarming dimensions for the stable bot from |
Zhenyao Mo | 7dede20 | 2018-09-18 17:57:21 | [diff] [blame] | 594 | [waterfalls.pyl], locking it to the new driver version. See [this |
| 595 | CL](https://chromium-review.googlesource.com/1197329) for an example, though |
| 596 | that CL was targeting a different OS version rather than driver version. |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 597 | |
Kenneth Russell | 9618adde | 2018-05-03 03:16:05 | [diff] [blame] | 598 | Note that we leave the experimental bot in place. We could reclaim it, but it |
| 599 | seems worthwhile to continuously test the "next" version of graphics drivers as |
| 600 | well as the current stable ones. |
Kai Ninomiya | a6429fb3 | 2018-03-30 01:30:56 | [diff] [blame] | 601 | |
| 602 | [updating the pixel tests]: https://www.chromium.org/developers/testing/gpu-testing/#TOC-Updating-and-Adding-New-Pixel-Tests-to-the-GPU-Bots |
| 603 | |
| 604 | ## Credentials for various servers |
| 605 | |
| 606 | Working with the GPU bots requires credentials to various services: the isolate |
| 607 | server, the swarming server, and cloud storage. |
| 608 | |
| 609 | ### Isolate server credentials |
| 610 | |
| 611 | To upload and download isolates you must first authenticate to the isolate |
| 612 | server. From a Chromium checkout, run: |
| 613 | |
| 614 | * `./src/tools/swarming_client/auth.py login |
| 615 | --service=https://isolateserver.appspot.com` |
| 616 | |
| 617 | This will open a web browser to complete the authentication flow. A @google.com |
| 618 | email address is required in order to properly authenticate. |
| 619 | |
| 620 | To test your authentication, find a hash for a recent isolate. Consult the |
| 621 | instructions on [Running Binaries from the Bots Locally] to find a random hash |
| 622 | from a target like `gl_tests`. Then run the following: |
| 623 | |
| 624 | [Running Binaries from the Bots Locally]: https://www.chromium.org/developers/testing/gpu-testing#TOC-Running-Binaries-from-the-Bots-Locally |
| 625 | |
| 626 | If authentication succeeded, this will silently download a file called |
| 627 | `delete_me` into the current working directory. If it failed, the script will |
| 628 | report multiple authentication errors. In this case, use the following command |
| 629 | to log out and then try again: |
| 630 | |
| 631 | * `./src/tools/swarming_client/auth.py logout |
| 632 | --service=https://isolateserver.appspot.com` |
| 633 | |
| 634 | ### Swarming server credentials |
| 635 | |
| 636 | The swarming server uses the same `auth.py` script as the isolate server. You |
| 637 | will need to authenticate if you want to manually download the results of |
| 638 | previous swarming jobs, trigger your own jobs, or run `swarming.py reproduce` |
| 639 | to re-run a remote job on your local workstation. Follow the instructions |
| 640 | above, replacing the service with `https://chromium-swarm.appspot.com`. |
| 641 | |
| 642 | ### Cloud storage credentials |
| 643 | |
| 644 | Authentication to Google Cloud Storage is needed for a couple of reasons: |
| 645 | uploading pixel test results to the cloud, and potentially uploading and |
| 646 | downloading builds as well, at least in Debug mode. Use the copy of gsutil in |
| 647 | `depot_tools/third_party/gsutil/gsutil`, and follow the [Google Cloud Storage |
| 648 | instructions] to authenticate. You must use your @google.com email address and |
| 649 | be a member of the Chrome GPU team in order to receive read-write access to the |
| 650 | appropriate cloud storage buckets. Roughly: |
| 651 | |
| 652 | 1. Run `gsutil config` |
| 653 | 2. Copy/paste the URL into your browser |
| 654 | 3. Log in with your @google.com account |
| 655 | 4. Allow the app to access the information it requests |
| 656 | 5. Copy-paste the resulting key back into your Terminal |
| 657 | 6. Press "enter" when prompted for a project-id (i.e., leave it empty) |
| 658 | |
| 659 | At this point you should be able to write to the cloud storage bucket. |
| 660 | |
| 661 | Navigate to |
| 662 | <https://console.developers.google.com/storage/chromium-gpu-archive> to view |
| 663 | the contents of the cloud storage bucket. |
| 664 | |
| 665 | [Google Cloud Storage instructions]: https://developers.google.com/storage/docs/gsutil |