blob: d36881f1d0ba53d1e3c8452cb9f5245c0a2464d4 [file] [log] [blame] [view]
Kai Ninomiyaa6429fb32018-03-30 01:30:561# GPU Bot Details
2
Kenneth Russell9618adde2018-05-03 03:16:053This page describes in detail how the GPU bots are set up, which files affect
Kai Ninomiyaa6429fb32018-03-30 01:30:564their configuration, and how to both modify their behavior and add new bots.
5
6[TOC]
7
8## Overview of the GPU bots' setup
9
10Chromium's GPU bots, compared to the majority of the project's test machines,
11are physical pieces of hardware. When end users run the Chrome browser, they
12are almost surely running it on a physical piece of hardware with a real
13graphics processor. There are some portions of the code base which simply can
14not be exercised by running the browser in a virtual machine, or on a software
15implementation of the underlying graphics libraries. The GPU bots were
16developed and deployed in order to cover these code paths, and avoid
17regressions that are otherwise inevitable in a project the size of the Chromium
18browser.
19
20The GPU bots are utilized on the [chromium.gpu] and [chromium.gpu.fyi]
21waterfalls, and various tryservers, as described in [Using the GPU Bots].
22
Kenneth Russell9618adde2018-05-03 03:16:0523[chromium.gpu]: https://ci.chromium.org/p/chromium/g/chromium.gpu/console
24[chromium.gpu.fyi]: https://ci.chromium.org/p/chromium/g/chromium.gpu.fyi/console
Kai Ninomiyaa6429fb32018-03-30 01:30:5625[Using the GPU Bots]: gpu_testing.md#Using-the-GPU-Bots
26
Kenneth Russell9618adde2018-05-03 03:16:0527All of the physical hardware for the bots lives in the Swarming pool, and most
28of it in the Chrome-GPU Swarming pool. The waterfall bots are simply virtual
29machines which spawn Swarming tasks with the appropriate tags to get them to run
30on the desired GPU and operating system type. So, for example, the [Win10
31Release (NVIDIA)] bot is actually a virtual machine which spawns all of its jobs
32with the Swarming parameters:
Kai Ninomiyaa6429fb32018-03-30 01:30:5633
34[Win10 Release (NVIDIA)]: https://ci.chromium.org/buildbot/chromium.gpu/Win10%20Release%20%28NVIDIA%29/?limit=200
35
36```json
37{
38 "gpu": "10de:1cb3-23.21.13.8816",
39 "os": "Windows-10",
40 "pool": "Chrome-GPU"
41}
42```
43
44Since the GPUs in the Swarming pool are mostly homogeneous, this is sufficient
45to target the pool of Windows 10-like NVIDIA machines. (There are a few Windows
467-like NVIDIA bots in the pool, which necessitates the OS specifier.)
47
48Details about the bots can be found on [chromium-swarm.appspot.com] and by
49using `src/tools/swarming_client/swarming.py`, for example `swarming.py bots`.
50If you are authenticated with @google.com credentials you will be able to make
51queries of the bots and see, for example, which GPUs are available.
52
53[chromium-swarm.appspot.com]: https://chromium-swarm.appspot.com/
54
55The waterfall bots run tests on a single GPU type in order to make it easier to
56see regressions or flakiness that affect only a certain type of GPU.
57
58The tryservers like `win_chromium_rel_ng` which include GPU tests, on the other
59hand, run tests on more than one GPU type. As of this writing, the Windows
60tryservers ran tests on NVIDIA and AMD GPUs; the Mac tryservers ran tests on
61Intel and NVIDIA GPUs. The way these tryservers' tests are specified is simply
62by *mirroring* how one or more waterfall bots work. This is an inherent
63property of the [`chromium_trybot` recipe][chromium_trybot.py], which was designed to eliminate
64differences in behavior between the tryservers and waterfall bots. Since the
65tryservers mirror waterfall bots, if the waterfall bot is working, the
66tryserver must almost inherently be working as well.
67
68[chromium_trybot.py]: https://chromium.googlesource.com/chromium/tools/build/+/master/scripts/slave/recipes/chromium_trybot.py
69
70There are a few one-off GPU configurations on the waterfall where the tests are
71run locally on physical hardware, rather than via Swarming. A few examples are:
72
73<!-- XXX: update this list -->
74* [Mac Pro Release (AMD)](https://luci-milo.appspot.com/buildbot/chromium.gpu.fyi/Mac%20Pro%20Release%20%28AMD%29/)
75* [Mac Pro Debug (AMD)](https://luci-milo.appspot.com/buildbot/chromium.gpu.fyi/Mac%20Pro%20Debug%20%28AMD%29/)
76* [Linux Release (Intel HD 630)](https://luci-milo.appspot.com/buildbot/chromium.gpu.fyi/Linux%20Release%20%28Intel%20HD%20630%29/)
77* [Linux Release (AMD R7 240)](https://luci-milo.appspot.com/buildbot/chromium.gpu.fyi/Linux%20Release%20%28AMD%20R7%20240%29/)
78
79There are a couple of reasons to continue to support running tests on a
80specific machine: it might be too expensive to deploy the required multiple
81copies of said hardware, or the configuration might not be reliable enough to
82begin scaling it up.
83
84## Adding a new isolated test to the bots
85
86Adding a new test step to the bots requires that the test run via an isolate.
87Isolates describe both the binary and data dependencies of an executable, and
88are the underpinning of how the Swarming system works. See the [LUCI wiki] for
89background on Isolates and Swarming.
90
91<!-- XXX: broken link -->
92[LUCI wiki]: https://github.com/luci/luci-py/wiki
93
94### Adding a new isolate
95
961. Define your target using the `template("test")` template in
97 [`src/testing/test.gni`][testing/test.gni]. See `test("gl_tests")` in
98 [`src/gpu/BUILD.gn`][gpu/BUILD.gn] for an example. For a more complex
99 example which invokes a series of scripts which finally launches the
100 browser, see [`src/chrome/telemetry_gpu_test.isolate`][telemetry_gpu_test.isolate].
1012. Add an entry to [`src/testing/buildbot/gn_isolate_map.pyl`][gn_isolate_map.pyl] that refers to
102 your target. Find a similar target to yours in order to determine the
103 `type`. The type is referenced in [`src/tools/mb/mb_config.pyl`][mb_config.pyl].
104
105[testing/test.gni]: https://chromium.googlesource.com/chromium/src/+/master/testing/test.gni
106[gpu/BUILD.gn]: https://chromium.googlesource.com/chromium/src/+/master/gpu/BUILD.gn
107<!-- XXX: broken link -->
108[telemetry_gpu_test.isolate]: https://chromium.googlesource.com/chromium/src/+/master/chrome/telemetry_gpu_test.isolate
109[gn_isolate_map.pyl]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/gn_isolate_map.pyl
110[mb_config.pyl]: https://chromium.googlesource.com/chromium/src/+/master/tools/mb/mb_config.pyl
111
112At this point you can build and upload your isolate to the isolate server.
113
114See [Isolated Testing for SWEs] for the most up-to-date instructions. These
115instructions are a copy which show how to run an isolate that's been uploaded
116to the isolate server on your local machine rather than on Swarming.
117
118[Isolated Testing for SWEs]: https://www.chromium.org/developers/testing/isolated-testing/for-swes
119
120If `cd`'d into `src/`:
121
1221. `./tools/mb/mb.py isolate //out/Release [target name]`
123 * For example: `./tools/mb/mb.py isolate //out/Release angle_end2end_tests`
1241. `python tools/swarming_client/isolate.py batcharchive -I https://isolateserver.appspot.com out/Release/[target name].isolated.gen.json`
125 * For example: `python tools/swarming_client/isolate.py batcharchive -I https://isolateserver.appspot.com out/Release/angle_end2end_tests.isolated.gen.json`
1261. This will write a hash to stdout. You can run it via:
127 `python tools/swarming_client/run_isolated.py -I https://isolateserver.appspot.com -s [HASH] -- [any additional args for the isolate]`
128
129See the section below on [isolate server credentials](#Isolate-server-credentials).
130
131### Adding your new isolate to the tests that are run on the bots
132
133See [Adding new steps to the GPU bots] for details on this process.
134
135[Adding new steps to the GPU bots]: gpu_testing.md#Adding-new-steps-to-the-GPU-Bots
136
137## Relevant files that control the operation of the GPU bots
138
139In the [tools/build] workspace:
140
141* [masters/master.chromium.gpu] and [masters/master.chromium.gpu.fyi]:
142 * builders.pyl in these two directories defines the bots that show up on
143 the waterfall. If you are adding a new bot, you need to add it to
144 builders.pyl and use go/bug-a-trooper to request a restart of either
145 master.chromium.gpu or master.chromium.gpu.fyi.
146 * Only changes under masters/ require a waterfall restart. All other
147 changes – for example, to scripts/slave/ in this workspace, or the
148 Chromium workspace – do not require a master restart (and go live the
149 minute they are committed).
150* `scripts/slave/recipe_modules/chromium_tests/`:
151 * <code>[chromium_gpu.py]</code> and
152 <code>[chromium_gpu_fyi.py]</code> define the following for
153 each builder and tester:
154 * How the workspace is checked out (e.g., this is where top-of-tree
155 ANGLE is specified)
156 * The build configuration (e.g., this is where 32-bit vs. 64-bit is
157 specified)
158 * Various gclient defines (like compiling in the hardware-accelerated
159 video codecs, and enabling compilation of certain tests, like the
160 dEQP tests, that can't be built on all of the Chromium builders)
161 * Note that the GN configuration of the bots is also controlled by
162 <code>[mb_config.pyl]</code> in the Chromium workspace; see below.
163 * <code>[trybots.py]</code> defines how try bots *mirror* one or more
164 waterfall bots.
165 * The concept of try bots mirroring waterfall bots ensures there are
166 no differences in behavior between the waterfall bots and the try
167 bots. This helps ensure that a CL will not pass the commit queue
168 and then break on the waterfall.
169 * This file defines the behavior of the following GPU-related try
170 bots:
171 * `linux_chromium_rel_ng`, `mac_chromium_rel_ng`, and
172 `win_chromium_rel_ng`, which run against every Chromium CL, and
173 which mirror the behavior of bots on the chromium.gpu
174 waterfall.
175 * The ANGLE try bots, which run against ANGLE CLs, and mirror the
176 behavior of the chromium.gpu.fyi waterfall (including using
177 top-of-tree ANGLE, and running additional tests not run by the
178 regular Chromium try bots)
179 * The optional GPU try servers `linux_optional_gpu_tests_rel`,
180 `mac_optional_gpu_tests_rel` and
181 `win_optional_gpu_tests_rel`, which are triggered manually and
182 run some tests which can't be run on the regular Chromium try
183 servers mainly due to lack of hardware capacity.
184
185[tools/build]: https://chromium.googlesource.com/chromium/tools/build/
186[masters/master.chromium.gpu]: https://chromium.googlesource.com/chromium/tools/build/+/master/masters/master.chromium.gpu/
187[masters/master.chromium.gpu.fyi]: https://chromium.googlesource.com/chromium/tools/build/+/master/masters/master.chromium.gpu.fyi/
188[chromium_gpu.py]: https://chromium.googlesource.com/chromium/tools/build/+/master/scripts/slave/recipe_modules/chromium_tests/chromium_gpu.py
189[chromium_gpu_fyi.py]: https://chromium.googlesource.com/chromium/tools/build/+/master/scripts/slave/recipe_modules/chromium_tests/chromium_gpu_fyi.py
190[trybots.py]: https://chromium.googlesource.com/chromium/tools/build/+/master/scripts/slave/recipe_modules/chromium_tests/trybots.py
191
192In the [chromium/src] workspace:
193
194* [src/testing/buildbot]:
195 * <code>[chromium.gpu.json]</code> and
196 <code>[chromium.gpu.fyi.json]</code> define which steps are run on
197 which bots. These files are autogenerated. Don't modify them directly!
198 * <code>[gn_isolate_map.pyl]</code> defines all of the isolates' behavior in the GN
199 build.
200* [`src/tools/mb/mb_config.pyl`][mb_config.pyl]
201 * Defines the GN arguments for all of the bots.
Kenneth Russell8a386d42018-06-02 09:48:01202* [`src/testing/buildbot/generate_buildbot_json.py`][generate_buildbot_json.py]
203 * The generator script for all the waterfalls, including `chromium.gpu.json` and
Kai Ninomiyaa6429fb32018-03-30 01:30:56204 `chromium.gpu.fyi.json`. It defines on which GPUs various tests run.
Kenneth Russell8a386d42018-06-02 09:48:01205 * See the [README for generate_buildbot_json.py] for documentation
206 on this script and the descriptions of the waterfalls and test suites.
Kai Ninomiyaa6429fb32018-03-30 01:30:56207 * When modifying this script, don't forget to also run it, to regenerate
Kenneth Russell8a386d42018-06-02 09:48:01208 the JSON files. Don't worry; the presubmit step will catch this if you forget.
Kai Ninomiyaa6429fb32018-03-30 01:30:56209 * See [Adding new steps to the GPU bots] for more details.
210
211[chromium/src]: https://chromium.googlesource.com/chromium/src/
212[src/testing/buildbot]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot
213[chromium.gpu.json]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/chromium.gpu.json
214[chromium.gpu.fyi.json]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/chromium.gpu.fyi.json
215[gn_isolate_map.pyl]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/gn_isolate_map.pyl
216[mb_config.pyl]: https://chromium.googlesource.com/chromium/src/+/master/tools/mb/mb_config.pyl
Kenneth Russell8a386d42018-06-02 09:48:01217[generate_buildbot_json.py]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/generate_buildbot_json.py
218[waterfalls.pyl]: https://chromium.googlesource.com/chromium/src/+/master/testing/buildbot/waterfalls.pyl
219[README for generate_buildbot_json.py]: ../../testing/buildbot/README.md
Kai Ninomiyaa6429fb32018-03-30 01:30:56220
221In the [infradata/config] workspace (Google internal only, sorry):
222
223* [configs/chromium-swarm/bots.cfg]
224 * Defines a `Chrome-GPU` Swarming pool which contains most of the
225 specialized hardware: as of this writing, the Windows and Linux NVIDIA
226 bots, the Windows AMD bots, and the MacBook Pros with NVIDIA and AMD
227 GPUs. New GPU hardware should be added to this pool.
228
229[infradata/config]: https://chrome-internal.googlesource.com/infradata/config
230[configs/chromium-swarm/bots.cfg]: https://chrome-internal.googlesource.com/infradata/config/+/master/configs/chromium-swarm/bots.cfg
231
232## Walkthroughs of various maintenance scenarios
233
234This section describes various common scenarios that might arise when
235maintaining the GPU bots, and how they'd be addressed.
236
237### How to add a new test or an entire new step to the bots
238
239This is described in [Adding new tests to the GPU bots].
240
241[Adding new tests to the GPU bots]: https://www.chromium.org/developers/testing/gpu-testing/#TOC-Adding-New-Tests-to-the-GPU-Bots
242
Kenneth Russell3a8e5c022018-05-04 21:14:49243### How to add a new tester bot to the chromium.gpu.fyi waterfall
Kai Ninomiyaa6429fb32018-03-30 01:30:56244
245When deploying a new GPU configuration, it should be added to the
246chromium.gpu.fyi waterfall first. The chromium.gpu waterfall should be reserved
247for those GPUs which are tested on the commit queue. (Some of the bots violate
248this rule – namely, the Debug bots – though we should strive to eliminate these
249differences.) Once the new configuration is ready to be fully deployed on
250tryservers, bots can be added to the chromium.gpu waterfall, and the tryservers
251changed to mirror them.
252
253In order to add Release and Debug waterfall bots for a new configuration,
254experience has shown that at least 4 physical machines are needed in the
255swarming pool. The reason is that the tests all run in parallel on the Swarming
256cluster, so the load induced on the swarming bots is higher than it would be
Kenneth Russell9618adde2018-05-03 03:16:05257if the tests were run strictly serially.
Kai Ninomiyaa6429fb32018-03-30 01:30:56258
Kenneth Russell9618adde2018-05-03 03:16:05259With these prerequisites, these are the steps to add a new (swarmed) tester bot.
260(Actually, pair of bots -- Release and Debug. If deploying just one or the
261other, ignore the other configuration.) These instructions assume that you are
262reusing one of the existing builders, like [`GPU FYI Win Builder`][GPU FYI Win
263Builder].
Kai Ninomiyaa6429fb32018-03-30 01:30:56264
2651. Work with the Chrome Infrastructure Labs team to get the (minimum 4)
266 physical machines added to the Swarming pool. Use
267 [chromium-swarm.appspot.com] or `src/tools/swarming_client/swarming.py bots`
268 to determine the PCI IDs of the GPUs in the bots. (These instructions will
269 need to be updated for Android bots which don't have PCI buses.)
Kenneth Russell9618adde2018-05-03 03:16:05270
Kai Ninomiyaa6429fb32018-03-30 01:30:56271 1. Make sure to add these new machines to the Chrome-GPU Swarming pool by
272 creating a CL against [`configs/chromium-swarm/bots.cfg`][bots.cfg] in
Kenneth Russell3a8e5c022018-05-04 21:14:49273 the [infradata/config] (Google internal) workspace. Git configure your
274 user.email to @google.com if necessary. Here is an [example
Kenneth Russell9618adde2018-05-03 03:16:05275 CL](https://chrome-internal-review.googlesource.com/524420).
276
Kai Ninomiyaa6429fb32018-03-30 01:30:562771. File a Chrome Infrastructure Labs ticket requesting 2 virtual machines for
278 the testers. These need to match the OS of the physical machines and
Kenneth Russell3a8e5c022018-05-04 21:14:49279 builders. For example, if you're adding a "Windows 7 CoolNewGPUType" tester,
280 you'll need 2 Windows VMs. See this [example
281 ticket](http://crbug.com/838975).
Kenneth Russell9618adde2018-05-03 03:16:05282
2831. Once the VMs are ready, create a CL in the
284 [`infradata/config`][infradata/config] (Google internal) workspace which
Kenneth Russell3a8e5c022018-05-04 21:14:49285 does the following. Git configure your user.email to @google.com if
286 necessary. Here's an [example
Kenneth Russell9618adde2018-05-03 03:16:05287 CL](https://chrome-internal-review.googlesource.com/619497).
288 1. Adds two new "bot_group" blocks in the Chromium GPU FYI section of
289 [`configs/chromium-swarm/bots.cfg`][bots.cfg], one for the Release bot
290 and one for the Debug bot. Copy the closest configuration you can find
291 -- for example, Windows, Android, etc.
292 1. Get this reviewed and landed. This step associates the VM with the bot's
293 name on the waterfall.
294
Kenneth Russell9618adde2018-05-03 03:16:052951. Create a CL in the Chromium workspace which does the following. Here's an
296 [example CL](https://chromium-review.googlesource.com/1041164).
Kenneth Russell8a386d42018-06-02 09:48:01297 1. Adds the new machines to [waterfalls.pyl].
Kai Ninomiyaa6429fb32018-03-30 01:30:56298 1. The swarming dimensions are crucial. These must match the GPU and
299 OS type of the physical hardware in the Swarming pool. This is what
300 causes the VMs to spawn their tests on the correct hardware. Make
301 sure to use the Chrome-GPU pool, and that the new machines were
302 specifically added to that pool.
Kai Ninomiyaa6429fb32018-03-30 01:30:56303 1. Make triply sure that there are no collisions between the new
304 hardware you're adding and hardware already in the Swarming pool.
305 For example, it used to be the case that all of the Windows NVIDIA
306 bots ran the same OS version. Later, the Windows 8 flavor bots were
307 added. In order to avoid accidentally running tests on Windows 8
308 when Windows 7 was intended, the OS in the swarming dimensions of
309 the Win7 bots had to be changed from `win` to
310 `Windows-2008ServerR2-SP1` (the Win7-like flavor running in our
311 data center). Similarly, the Win8 bots had to have a very precise
312 OS description (`Windows-2012ServerR2-SP0`).
Kenneth Russell9618adde2018-05-03 03:16:05313 1. If you're deploying a new bot that's similar to another existing
Kenneth Russell8a386d42018-06-02 09:48:01314 configuration, please search around in
315 `src/testing/buildbot/test_suite_exceptions.pyl` for references to
Kenneth Russell9618adde2018-05-03 03:16:05316 the other bot's name and see if your new bot needs to be added to
317 any exclusion lists. For example, some of the tests don't run on
318 certain Win bots because of missing OpenGL extensions.
Kenneth Russell8a386d42018-06-02 09:48:01319 1. Run [generate_buildbot_json.py] to regenerate
Kenneth Russell9618adde2018-05-03 03:16:05320 `src/testing/buildbot/chromium.gpu.fyi.json`.
321 1. Updates [`cr-buildbucket.cfg`][cr-buildbucket.cfg]:
322 * Add the two new machines (Release and Debug) inside the
323 luci.chromium.ci bucket. This sets up storage for the builds in the
324 system. Use the appropriate mixin; for example, "win-gpu-fyi-ci" has
325 already been set up for Windows GPU FYI bots on the waterfall.
326 1. Updates [`luci-scheduler.cfg`][luci-scheduler.cfg]:
327 * Add new "job" blocks for your new Release and Debug test bots. They
328 should go underneath the builder which triggers them (like "GPU Win
329 FYI Builder"), in alphabetical order. Make sure the "id" and
330 "builer" entries match. This job block should use the acl_sets
331 "triggered-by-parent-builders", because it's triggered by the
332 builder, and not by changes to the git repository.
333 1. Updates [`luci-milo.cfg`][luci-milo.cfg]:
334 * Add new "builders" blocks for your new testers (Release and Debug)
335 on the [`chromium.gpu.fyi`][chromium.gpu.fyi] console. Look at the
336 short names and categories and try to come up with a reasonable
337 organization.
338 1. If you were adding a new builder, you would need to also add the new
339 machine to [`src/tools/mb/mb_config.pyl`][mb_config.pyl].
Kenneth Russell139881b2018-05-04 00:45:20340
3411. After the Chromium-side CL lands it will take some time for all of
342 the configuration changes to be picked up by the system. The bot
Kenneth Russell4d1bb4482018-05-09 23:36:37343 will probably be in a red or purple state, claiming that it can't
344 find its configuration. (It might also be in an "empty" state, not
345 running any jobs at all.)
Kenneth Russell139881b2018-05-04 00:45:20346
Kenneth Russell4d1bb4482018-05-09 23:36:373471. *After* the Chromium-side CL lands and the bot is on the console, create a CL
348 in the [`tools/build`][tools/build] workspace which does the
Kenneth Russell139881b2018-05-04 00:45:20349 following. Here's an [example
350 CL](https://chromium-review.googlesource.com/1041145).
351 1. Adds the new VMs to [`chromium_gpu_fyi.py`][chromium_gpu_fyi.py] in
352 `scripts/slave/recipe_modules/chromium_tests/`. Make sure to set the
353 `serialize_tests` property to `True`. This is specified for waterfall
354 bots, but not trybots, and helps avoid overloading the physical
355 hardware. Double-check the `BUILD_CONFIG` and `parent_buildername`
356 properties for each. They must match the Release/Debug flavor of the
357 builder, like `GPU FYI Win Builder` vs. `GPU FYI Win Builder (dbg)`.
358 1. Get this reviewed and landed. This step tells the Chromium recipe about
359 the newly-deployed waterfall bot, so it knows which JSON file to load
360 out of src/testing/buildbot and which entry to look at.
361 1. It used to be necessary to retrain recipe expectations
362 (`scripts/slave/recipes.py --use-bootstrap test train`). This doesn't
363 appear to be necessary any more, but it's something to watch out for if
364 your CL fails presubmit for some reason.
365
Kenneth Russell4d1bb4482018-05-09 23:36:373661. Note that it is crucial that the bot be deployed before hooking it up in the
367 tools/build workspace. In the new LUCI world, if the parent builder can't
368 find its child testers to trigger, that's a hard error on the parent. This
369 will cause the builders to fail. You can and should prepare the tools/build
370 CL in advance, but make sure it doesn't land until the bot's on the console.
Kai Ninomiyaa6429fb32018-03-30 01:30:56371
Kenneth Russell9618adde2018-05-03 03:16:05372[bots.cfg]: https://chrome-internal.googlesource.com/infradata/config/+/master/configs/chromium-swarm/bots.cfg
373[infradata/config]: https://chrome-internal.googlesource.com/infradata/config/
374[cr-buildbucket.cfg]: https://chromium.googlesource.com/chromium/src/+/master/infra/config/global/cr-buildbucket.cfg
375[luci-milo.cfg]: https://chromium.googlesource.com/chromium/src/+/master/infra/config/global/luci-milo.cfg
376[luci-scheduler.cfg]: https://chromium.googlesource.com/chromium/src/+/master/infra/config/global/luci-scheduler.cfg
377[GPU FYI Win Builder]: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/GPU%20FYI%20Win%20Builder
Kai Ninomiyaa6429fb32018-03-30 01:30:56378
Kenneth Russell3a8e5c022018-05-04 21:14:49379### How to start running tests on a new GPU type on an existing try bot
Kai Ninomiyaa6429fb32018-03-30 01:30:56380
381Let's say that you want to cause the `win_chromium_rel_ng` try bot to run tests
382on CoolNewGPUType in addition to the types it currently runs (as of this
383writing, NVIDIA and AMD). To do this:
384
3851. Make sure there is enough hardware capacity. Unfortunately, tools to report
386 utilization of the Swarming pool are still being developed, but a
387 back-of-the-envelope estimate is that you will need a minimum of 30
388 machines in the Swarming pool to run the current set of GPU tests on the
389 tryservers. We estimate that 90 machines will be needed in order to
390 additionally run the WebGL 2.0 conformance tests. Plan for the larger
391 capacity, as it's desired to run the larger test suite on as many
392 configurations as possible.
3932. Deploy Release and Debug testers on the chromium.gpu waterfall, following
394 the instructions for the chromium.gpu.fyi waterfall above. You will also
395 need to temporarily add suppressions to
396 [`tests/masters_recipes_test.py`][tests/masters_recipes_test.py] for these
397 new testers since they aren't yet covered by try bots and are going on a
398 non-FYI waterfall. Make sure these run green for a day or two before
399 proceeding.
4003. Create a CL in the tools/build workspace, adding the new Release tester
401 to `win_chromium_rel_ng`'s `bot_ids` list
402 in `scripts/slave/recipe_modules/chromium_tests/trybots.py`. Rerun
403 `scripts/slave/recipes.py --use-bootstrap test train`.
4044. Once the CL in (3) lands, the commit queue will **immediately** start
405 running tests on the CoolNewGPUType configuration. Be vigilant and make
406 sure that tryjobs are green. If they are red for any reason, revert the CL
407 and figure out offline what went wrong.
408
409[tests/masters_recipes_test.py]: https://chromium.googlesource.com/chromium/tools/build/+/master/tests/masters_recipes_test.py
410
Kenneth Russell3a8e5c022018-05-04 21:14:49411### How to add a new manually-triggered trybot
412
413There are a lot of one-off GPU types on the chromium.gpu.fyi waterfall and
414sometimes a failure happens just on one type. It's helpful to just be able to
415send a tryjob to a particular machine. Doing so requires a specific trybot to be
416set up because most if not all of the existing trybots trigger tests on more
417than one type of GPU.
418
419Here are the steps to set up a new trybot which runs tests just on one
420particular GPU type. Let's consider that we are adding a manually-triggered
421trybot for the Win7 NVIDIA GPUs in Release mode. We will call the new bot
422`gpu_manual_try_win7_nvidia_rel`.
423
4241. File a Chrome Infrastructure Labs ticket requesting ~3 virtual
425 machines. These will do builds and trigger jobs on the physical hardware,
426 and need to match the OS of the physical machines. See this [example
427 ticket](http://crbug.com/839216).
428
4291. Once the VMs are ready, create a CL in the
430 [`infradata/config`][infradata/config] (Google internal) workspace which
431 does the following. Git configure your user.email to @google.com if
432 necessary. Here's an [example
433 CL](https://chrome-internal-review.googlesource.com/620773).
434 1. Adds a new "bot_group" block in the "manually-triggered GPU trybots"
435 section of [`configs/chromium-swarm/bots.cfg`][bots.cfg]. Look in the
436 optional GPU tryserver section for the closest configuration you can
437 find to copy from -- for example, Windows, Android,
438 etc. (win_optional_gpu_tests_rel, android_optional_gpu_tests_rel). The
439 "dimensions" tag contains the name of the trybot,
440 e.g. "builder:gpu_manual_try_win7_nvidia_rel".
441 1. Get this reviewed and landed. This step makes these machines the ones
442 which perform the builds for this new trybot.
443
4441. Create a CL in the Chromium workspace which does the following. Here's an
445 [example CL](https://chromium-review.googlesource.com/1044767).
446 1. Updates [`cr-buildbucket.cfg`][cr-buildbucket.cfg]:
447 * Add the new trybot to the `luci.chromium.try` bucket. This is a
448 one-liner, with "name" being "gpu_manual_try_win7_nvidia_rel" and
449 "mixins" being the OS-appropriate mixin, in this case
450 "win-optional-gpu-try". (We're repurposing the existing ACLs for the
451 "optional" GPU trybots for these manually-triggered ones.)
452 1. Updates [`luci-milo.cfg`][luci-milo.cfg]:
453 * Add "builders" blocks for the new trybot to the `luci.chromium.try` and
454 `tryserver.chromium.win` consoles.
455 1. Adds the new trybot to
456 [`src/tools/mb/mb_config.pyl`][mb_config.pyl]. Reuse the same mixin as
457 for the optional GPU trybot; in this case,
458 `gpu_fyi_tests_release_trybot_x86`.
459 1. Get this CL reviewed and landed.
460
4611. Create a CL in the [`tools/build`][tools/build] workspace which does the
462 following. Here's an [example
463 CL](https://chromium-review.googlesource.com/1044761).
464
465 1. Adds the new trybot to a "Manually-triggered GPU trybots" section in
466 `scripts/slave/recipe_modules/chromium_tests/trybots.py`. Create this
467 section after the "Optional GPU bots" section for the appropriate
468 tryserver (`tryserver.chromium.win`, `tryserver.chromium.mac`,
469 `tryserver.chromium.linux`, `tryserver.chromium.android`). Have the bot
470 mirror the appropriate waterfall bot; in this case, the buildername to
471 mirror is `GPU FYI Win Builder` and the tester is `Win7 FYI Release
472 (NVIDIA)`.
473 1. Adds an exception for your new trybot in `tests/masters_recipes_test.py`,
474 under `FAKE_BUILDERS`, under the appropriate tryserver waterfall (in
475 this case, `master.tryserver.chromium.win`). This is because this is a
476 LUCI-only bot, and this test verifies the old buildbot configurations.
477 1. Get this reviewed and landed. This step tells the Chromium recipe about
478 the newly-deployed trybot, so it knows which JSON file to load out of
479 src/testing/buildbot and which entry to look at to understand which
480 tests to run and on what physical hardware.
481 1. It used to be necessary to retrain recipe expectations
482 (`scripts/slave/recipes.py --use-bootstrap test train`). This doesn't
483 appear to be necessary any more, but it's something to watch out for if
484 your CL fails presubmit for some reason.
485
Kenneth Russellfc566142018-06-26 22:34:15486At this point the new trybot should automatically show up in the
487"Choose tryjobs" pop-up in the Gerrit UI, under the
488`luci.chromium.try` heading, because it was deployed via LUCI. It
489should be possible to send a CL to it.
Kenneth Russell3a8e5c022018-05-04 21:14:49490
Kenneth Russellfc566142018-06-26 22:34:15491(It should not be necessary to modify buildbucket.config as is
492mentioned at the bottom of the "Choose tryjobs" pop-up. Contact the
493chrome-infra team if this doesn't work as expected.)
Kenneth Russell3a8e5c022018-05-04 21:14:49494
495[chromium/src]: https://chromium-review.googlesource.com/q/project:chromium%252Fsrc+status:open
496[go/chromecals]: http://go/chromecals
497
498
499### How to add a new "optional" try bot
500
501TODO(kbr): the naming of the "optional" try bots is confusing and
502unfortunate. They should probably be renamed to something like "extratests" or
503"extra_tests", so perhaps a new naming convention of "gpu_win_extratests_rel" or
504"win_gpu_extratests_rel". Unfortunately making this change at this point
505requires touching tons of files across many workspaces and is unlikely to happen
506unless someone highly motivated wants to pick up the task.
Kai Ninomiyaa6429fb32018-03-30 01:30:56507
508The "optional" GPU try bots are a concession to the reality that there are some
509long-running GPU test suites that simply can not run against every Chromium CL.
510They run some additional tests that are usually run only on the
511chromium.gpu.fyi waterfall. Some of these tests, like the WebGL 2.0 conformance
512suite, are intended to be run on the normal try bots once hardware capacity is
513available. Some are not intended to ever run on the normal try bots.
514
515The optional try bots are a little different because they mirror waterfall bots
516that don't actually exist. The waterfall bots' specifications exist only to
517tell the optional try bots which tests to run.
518
519Let's say that you intended to add a new such optional try bot on Windows. Call
520it `win_new_optional_tests_rel` for example. Now, if you wanted to just add
521this GPU type to the existing `win_optional_gpu_tests_rel` try bot, you'd
522just follow the instructions above
523([How to start running tests on a new GPU type on an existing try bot](#How-to-start-running-tests-on-a-new-GPU-type-on-an-existing-try-bot)). The steps below describe how to spin up
524an entire new optional try bot.
525
5261. Make sure that you have some swarming capacity for the new GPU type. Since
527 it's not running against all Chromium CLs you don't need the recommended 30
528 minimum bots, though ~10 would be good.
5291. Create a CL in the Chromium workspace:
530 1. Add your new bot (for example, "Optional Win7 Release
531 (CoolNewGPUType)") to the chromium.gpu.fyi waterfall in
Kenneth Russell8a386d42018-06-02 09:48:01532 [waterfalls.pyl]. (Note, this is a bad example: the
Kai Ninomiyaa6429fb32018-03-30 01:30:56533 "optional" bots have special semantics in this script. You'd probably
534 want to define some new category of bot if you didn't intend to add
535 this to `win_optional_gpu_tests_rel`.)
536 1. Re-run the script to regenerate the JSON files.
5371. Land the above CL.
5381. Create a CL in the tools/build workspace:
539 1. Modify `masters/master.tryserver.chromium.win`'s [master.cfg] and
540 [slaves.cfg] to add the new tryserver. Follow the pattern for the
541 existing `win_optional_gpu_tests_rel` tryserver. Namely, add the new
542 entry to master.cfg, and add the new tryserver to the
543 `optional_builders` list in `slaves.cfg`.
544 1. Modify [`chromium_gpu_fyi.py`][chromium_gpu_fyi.py] to add the new
545 "Optional Win7 Release (CoolNewGPUType)" entry.
546 1. Modify [`trybots.py`][trybots.py] to add
547 the new `win_new_optional_tests_rel` try bot, mirroring "Optional
548 Win7 Release (CoolNewGPUType)".
5491. Land the above CL and request an off-hours restart of the
550 tryserver.chromium.win waterfall.
5511. Now you can send CLs to the new bot with:
552 `git cl try -m tryserver.chromium.win -b win_new_optional_tests_rel`
553
554[master.cfg]: https://chromium.googlesource.com/chromium/tools/build/+/master/masters/master.tryserver.chromium.win/master.cfg
555[slaves.cfg]: https://chromium.googlesource.com/chromium/tools/build/+/master/masters/master.tryserver.chromium.win/slaves.cfg
556
Kenneth Russell3a8e5c022018-05-04 21:14:49557### How to test and deploy a driver update
Kai Ninomiyaa6429fb32018-03-30 01:30:56558
559Let's say that you want to roll out an update to the graphics drivers on one of
560the configurations like the Win7 NVIDIA bots. The responsible way to do this is
561to run the new driver on one of the waterfalls for a day or two to make sure
562the tests are running reliably green before rolling out the driver update
563everywhere. To do this:
564
Kenneth Russell9618adde2018-05-03 03:16:055651. Make sure that all of the current Swarming jobs for this OS and GPU
566 configuration are targeted at the "stable" version of the driver in
Kenneth Russell8a386d42018-06-02 09:48:01567 [waterfalls.pyl].
Kenneth Russell9618adde2018-05-03 03:16:055681. File a `Build Infrastructure` bug, component `Infra>Labs`, to have ~4 of the
569 physical machines already in the Swarming pool upgraded to the new version
570 of the driver.
5711. If an "experimental" version of this bot doesn't yet exist, follow the
572 instructions above for [How to add a new tester bot to the chromium.gpu.fyi
573 waterfall](#How-to-add-a-new-tester-bot-to-the-chromium_gpu_fyi-waterfall)
574 to deploy one.
5751. Have this experimental bot target the new version of the driver in
Kenneth Russell8a386d42018-06-02 09:48:01576 [waterfalls.pyl].
Kenneth Russell9618adde2018-05-03 03:16:055771. Hopefully, the new machine will pass the pixel tests. If it doesn't, then
Kai Ninomiyaa6429fb32018-03-30 01:30:56578 unfortunately, it'll be necessary to follow the instructions on
579 [updating the pixel tests] to temporarily suppress the failures on this
580 particular configuration. Keep the time window for these test suppressions
581 as narrow as possible.
Kenneth Russell9618adde2018-05-03 03:16:055821. Watch the new machine for a day or two to make sure it's stable.
Kenneth Russell8a386d42018-06-02 09:48:015831. When it is, update [waterfalls.pyl] to use the
Kenneth Russell9618adde2018-05-03 03:16:05584 "gpu trigger script" functionality to select *either* the stable *or* the
585 new driver version on the stable version of the bot. See [this
Zhenyao Mo7dede202018-09-18 17:57:21586 CL](https://chromium-review.googlesource.com/1189059) for an example, though
Kenneth Russell9618adde2018-05-03 03:16:05587 that CL was targeting a different OS version rather than driver version.
5881. After that lands, ask the Chrome Infrastructure Labs team to roll out the
589 driver update across all of the similarly configured bots in the swarming
590 pool.
5911. If necessary, update pixel test expectations and remove the suppressions
Kai Ninomiyaa6429fb32018-03-30 01:30:56592 added above.
Kenneth Russell9618adde2018-05-03 03:16:055931. Remove the alternate swarming dimensions for the stable bot from
Zhenyao Mo7dede202018-09-18 17:57:21594 [waterfalls.pyl], locking it to the new driver version. See [this
595 CL](https://chromium-review.googlesource.com/1197329) for an example, though
596 that CL was targeting a different OS version rather than driver version.
Kai Ninomiyaa6429fb32018-03-30 01:30:56597
Kenneth Russell9618adde2018-05-03 03:16:05598Note that we leave the experimental bot in place. We could reclaim it, but it
599seems worthwhile to continuously test the "next" version of graphics drivers as
600well as the current stable ones.
Kai Ninomiyaa6429fb32018-03-30 01:30:56601
602[updating the pixel tests]: https://www.chromium.org/developers/testing/gpu-testing/#TOC-Updating-and-Adding-New-Pixel-Tests-to-the-GPU-Bots
603
604## Credentials for various servers
605
606Working with the GPU bots requires credentials to various services: the isolate
607server, the swarming server, and cloud storage.
608
609### Isolate server credentials
610
611To upload and download isolates you must first authenticate to the isolate
612server. From a Chromium checkout, run:
613
614* `./src/tools/swarming_client/auth.py login
615 --service=https://isolateserver.appspot.com`
616
617This will open a web browser to complete the authentication flow. A @google.com
618email address is required in order to properly authenticate.
619
620To test your authentication, find a hash for a recent isolate. Consult the
621instructions on [Running Binaries from the Bots Locally] to find a random hash
622from a target like `gl_tests`. Then run the following:
623
624[Running Binaries from the Bots Locally]: https://www.chromium.org/developers/testing/gpu-testing#TOC-Running-Binaries-from-the-Bots-Locally
625
626If authentication succeeded, this will silently download a file called
627`delete_me` into the current working directory. If it failed, the script will
628report multiple authentication errors. In this case, use the following command
629to log out and then try again:
630
631* `./src/tools/swarming_client/auth.py logout
632 --service=https://isolateserver.appspot.com`
633
634### Swarming server credentials
635
636The swarming server uses the same `auth.py` script as the isolate server. You
637will need to authenticate if you want to manually download the results of
638previous swarming jobs, trigger your own jobs, or run `swarming.py reproduce`
639to re-run a remote job on your local workstation. Follow the instructions
640above, replacing the service with `https://chromium-swarm.appspot.com`.
641
642### Cloud storage credentials
643
644Authentication to Google Cloud Storage is needed for a couple of reasons:
645uploading pixel test results to the cloud, and potentially uploading and
646downloading builds as well, at least in Debug mode. Use the copy of gsutil in
647`depot_tools/third_party/gsutil/gsutil`, and follow the [Google Cloud Storage
648instructions] to authenticate. You must use your @google.com email address and
649be a member of the Chrome GPU team in order to receive read-write access to the
650appropriate cloud storage buckets. Roughly:
651
6521. Run `gsutil config`
6532. Copy/paste the URL into your browser
6543. Log in with your @google.com account
6554. Allow the app to access the information it requests
6565. Copy-paste the resulting key back into your Terminal
6576. Press "enter" when prompted for a project-id (i.e., leave it empty)
658
659At this point you should be able to write to the cloud storage bucket.
660
661Navigate to
662<https://console.developers.google.com/storage/chromium-gpu-archive> to view
663the contents of the cloud storage bucket.
664
665[Google Cloud Storage instructions]: https://developers.google.com/storage/docs/gsutil