blob: 06ff5af94d16f27886cf4802acd92f28b9a7d5bb [file] [log] [blame] [view]
Kai Ninomiyaa6429fb32018-03-30 01:30:561# GPU Bots & Pixel Wrangling
2
3![](images/wrangler.png)
4
5(December 2017: presentation on GPU bots and pixel wrangling: see [slides].)
6
7GPU Pixel Wrangling is the process of keeping various GPU bots green. On the
8GPU bots, tests run on physical hardware with real GPUs, not in VMs like the
9majority of the bots on the Chromium waterfall.
10
11[slides]: https://docs.google.com/presentation/d/1sZjyNe2apUhwr5sinRfPs7eTzH-3zO0VQ-Cj-8DlEDQ/edit?usp=sharing
12
13[TOC]
14
15## Fleet Status
16
Kenneth Russellffe96ee2019-03-16 00:37:2917* [Chrome GPU Fleet Status](http://vi/chrome-infra/Projects/gpu)
Kai Ninomiyaa6429fb32018-03-30 01:30:5618
Kenneth Russellffe96ee2019-03-16 00:37:2919(Sorry, this link is Google internal only.)
Kai Ninomiyaa6429fb32018-03-30 01:30:5620
Kenneth Russellffe96ee2019-03-16 00:37:2921These graphs show 1 day of activity by default. The drop-down boxes at the top
22allow viewing of longer durations.
Kai Ninomiyaa6429fb32018-03-30 01:30:5623
Kenneth Russellffe96ee2019-03-16 00:37:2924See [this CL](http://cl/238562533) for an example of how to update these graphs.
Kai Ninomiyaa6429fb32018-03-30 01:30:5625
26## GPU Bots' Waterfalls
27
28The waterfalls work much like any other; see the [Tour of the Chromium Buildbot
29Waterfall] for a more detailed explanation of how this is laid out. We have
30more subtle configurations because the GPU matters, not just the OS and release
31v. debug. Hence we have Windows Nvidia Release bots, Mac Intel Debug bots, and
32so on. The waterfalls we’re interested in are:
33
34* [Chromium GPU]
35 * Various operating systems, configurations, GPUs, etc.
36* [Chromium GPU FYI]
37 * These bots run less-standard configurations like Windows with AMD GPUs,
38 Linux with Intel GPUs, etc.
39 * These bots build with top of tree ANGLE rather than the `DEPS` version.
40 * The [ANGLE tryservers] help ensure that these bots stay green. However,
41 it is possible that due to ANGLE changes these bots may be red while
42 the chromium.gpu bots are green.
43 * The [ANGLE Wrangler] is on-call to help resolve ANGLE-related breakage
44 on this watefall.
45 * To determine if a different ANGLE revision was used between two builds,
46 compare the `got_angle_revision` buildbot property on the GPU builders
47 or `parent_got_angle_revision` on the testers. This revision can be
48 used to do a `git log` in the `third_party/angle` repository.
Yuly Novikov5564c672020-07-21 19:34:4649* [Chromium SwANGLE]
50 * These bots run GPU tests on top of ANGLE's GLES implementation running
51 on top of SwiftShader's Vulkan implementation purely in software.
52 Regressions should mostly be handled by the [ANGLE Wrangler], but some
53 failures fall into Pixel Wrangler's domain, for example, WebGL failures
54 due to Chromium-side and WebGL-side changes on
55 linux-swangle-chromium-x64, mac-swangle-chromium-x64 and
56 win-swangle-chromium-x86 bots.
Kai Ninomiyaa6429fb32018-03-30 01:30:5657
58<!-- TODO(kainino): update link when the page is migrated -->
59[Tour of the Chromium Buildbot Waterfall]: http://www.chromium.org/developers/testing/chromium-build-infrastructure/tour-of-the-chromium-buildbot
60[Chromium GPU]: https://ci.chromium.org/p/chromium/g/chromium.gpu/console?reload=120
61[Chromium GPU FYI]: https://ci.chromium.org/p/chromium/g/chromium.gpu.fyi/console?reload=120
Yuly Novikov5564c672020-07-21 19:34:4662[Chromium SwANGLE]: https://ci.chromium.org/p/chromium/g/chromium.swangle/console?reload=120
Kai Ninomiyaa6429fb32018-03-30 01:30:5663[ANGLE tryservers]: https://build.chromium.org/p/tryserver.chromium.angle/waterfall
kylechar56873942019-11-11 17:29:5564[ANGLE Wrangler]: https://chromium.googlesource.com/angle/angle/+/master/infra/ANGLEWrangling.md
Kai Ninomiyaa6429fb32018-03-30 01:30:5665
66## Test Suites
67
68The bots run several test suites. The majority of them have been migrated to
69the Telemetry harness, and are run within the full browser, in order to better
70test the code that is actually shipped. As of this writing, the tests included:
71
72* Tests using the Telemetry harness:
73 * The WebGL conformance tests: `webgl_conformance_integration_test.py`
74 * A Google Maps test: `maps_integration_test.py`
75 * Context loss tests: `context_lost_integration_test.py`
76 * Depth capture tests: `depth_capture_integration_test.py`
77 * GPU process launch tests: `gpu_process_integration_test.py`
78 * Hardware acceleration validation tests:
79 `hardware_accelerated_feature_integration_test.py`
80 * Pixel tests validating the end-to-end rendering pipeline:
81 `pixel_integration_test.py`
82 * Stress tests of the screenshot functionality other tests use:
83 `screenshot_sync_integration_test.py`
Daniel Bratellf73f0df2018-09-24 13:52:4984* `angle_unittests`: see `src/third_party/angle/src/tests/BUILD.gn`
Kai Ninomiyaa6429fb32018-03-30 01:30:5685* drawElements tests (on the chromium.gpu.fyi waterfall): see
86 `src/third_party/angle/src/tests/BUILD.gn`
87* `gles2_conform_test` (requires internal sources): see
Daniel Bratellf73f0df2018-09-24 13:52:4988 `src/gpu/gles2_conform_support/BUILD.gn`
Takuto Ikutaf5333252019-11-06 16:07:0889* `gl_tests`: see `src/gpu/BUILD.gn`
Kai Ninomiyaa6429fb32018-03-30 01:30:5690* `gl_unittests`: see `src/ui/gl/BUILD.gn`
behdadcf8139e2019-12-02 17:55:4691* `rendering_representative_perf_tests` (on the chromium.gpu.fyi waterfall):
92 see `src/chrome/test/BUILD.gn`
Kai Ninomiyaa6429fb32018-03-30 01:30:5693
Kenneth Russell8a386d42018-06-02 09:48:0194And more. See
95[`src/testing/buildbot/README.md`](../../testing/buildbot/README.md)
96and the GPU sections of `test_suites.pyl` and `waterfalls.pyl` for the
Kai Ninomiyaa6429fb32018-03-30 01:30:5697complete description of bots and tests.
98
99Additionally, the Release bots run:
100
101* `tab_capture_end2end_tests:` see
102 `src/chrome/browser/extensions/api/tab_capture/tab_capture_apitest.cc` and
103 `src/chrome/browser/extensions/api/cast_streaming/cast_streaming_apitest.cc`
104
105### More Details
106
107More details about the bots' setup can be found on the [GPU Testing] page.
108
109[GPU Testing]: https://sites.google.com/a/chromium.org/dev/developers/testing/gpu-testing
110
111## Wrangling
112
113### Prerequisites
114
1151. Ideally a wrangler should be a Chromium committer. If you're on the GPU
116pixel wrangling rotation, there will be an email notifying you of the upcoming
117shift, and a calendar appointment.
118 * If you aren't a committer, don't panic. It's still best for everyone on
119 the team to become acquainted with the procedures of maintaining the
120 GPU bots.
121 * In this case you'll upload CLs to Gerrit to perform reverts (optionally
122 using the new "Revert" button in the UI), and might consider using
Henrique Ferreiro804beaf2020-03-06 20:56:59123 `Tbr:` to speed through trivial and urgent CLs. In general, try to send
Kai Ninomiyaa6429fb32018-03-30 01:30:56124 all CLs through the commit queue.
125 * Contact bajones, kainino, kbr, vmiura, zmo, or another member of the
126 Chrome GPU team who's already a committer for help landing patches or
127 reverts during your shift.
James Darpinianabd9f472018-05-22 22:14:201281. Apply for [access to the bots].
1291. You may want to install the [Flake linker] extension, which adds several useful features to the bot build log pages.
130 * Links to Chromium flakiness dashboard from build result pages, so you can see all failures for a single test across the fleet.
131 * Automatically hides green build steps so you can see the failure immediately.
132 * Turns build log links into deep links directly to the failure line in the log.
Kai Ninomiyaa6429fb32018-03-30 01:30:56133
134[access to the bots]: https://sites.google.com/a/google.com/chrome-infrastructure/golo/remote-access?pli=1
James Darpinianabd9f472018-05-22 22:14:20135[Flake linker]: https://chrome.google.com/webstore/detail/flake-linker/boamnmbgmfnobomddmenbaicodgglkhc
Kai Ninomiyaa6429fb32018-03-30 01:30:56136
137### How to Keep the Bots Green
138
1391. Watch for redness on the tree.
Yuly Novikov5564c672020-07-21 19:34:46140 1. [Sheriff-O-Matic] now has support for all the
141 [GPU Bots' Waterfalls](#GPU-Bots_Waterfalls) under the
142 [Chromium GPU][Sheriff-O-Matic] tab!
Kai Ninomiyaa6429fb32018-03-30 01:30:56143 1. The bots are expected to be green all the time. Flakiness on these bots
144 is neither expected nor acceptable.
145 1. If a bot goes consistently red, it's necessary to figure out whether a
146 recent CL caused it, or whether it's a problem with the bot or
147 infrastructure.
148 1. If it looks like a problem with the bot (deep problems like failing to
149 check out the sources, the isolate server failing, etc.) notify the
150 Chromium troopers and file a P1 bug with labels: Infra\>Labs,
151 Infra\>Troopers and Internals\>GPU\>Testing. See the general [tree
152 sheriffing page] for more details.
153 1. Otherwise, examine the builds just before and after the redness was
154 introduced. Look at the revisions in the builds before and after the
155 failure was introduced.
156 1. **File a bug** capturing the regression range and excerpts of any
157 associated logs. Regressions should be marked P1. CC engineers who you
158 think may be able to help triage the issue. Keep in mind that the logs
159 on the bots expire after a few days, so make sure to add copies of
160 relevant logs to the bug report.
161 1. Use the `Hotlist=PixelWrangler` label to mark bugs that require the
162 pixel wrangler's attention, so it's easy to find relevant bugs when
163 handing off shifts.
164 1. Study the regression range carefully. Use drover to revert any CLs
165 which break the chromium.gpu bots. Use your judgment about
166 chromium.gpu.fyi, since not all bots are covered by trybots. In the
167 revert message, provide a clear description of what broke, links to
168 failing builds, and excerpts of the failure logs, because the build
169 logs expire after a few days.
1701. Make sure the bots are running jobs.
171 1. Keep an eye on the console views of the various bots.
172 1. Make sure the bots are all actively processing jobs. If they go offline
173 for a long period of time, the "summary bubble" at the top may still be
174 green, but the column in the console view will be gray.
175 1. Email the Chromium troopers if you find a bot that's not processing
176 jobs.
1771. Make sure the GPU try servers are in good health.
178 1. The GPU try servers are no longer distinct bots on a separate
179 waterfall, but instead run as part of the regular tryjobs on the
180 Chromium waterfalls. The GPU tests run as part of the following
181 tryservers' jobs:
Stephen Martinis089f5f02019-02-12 02:42:24182 1. `[linux-rel]` on the [luci.chromium.try] waterfall
183 1. `[mac-rel]` on the [luci.chromium.try] waterfall
184 1. `[win7-rel]` on the [luci.chromium.try] waterfall
Kai Ninomiyaa6429fb32018-03-30 01:30:56185 1. The best tool to use to quickly find flakiness on the tryservers is the
186 new [Chromium Try Flakes] tool. Look for the names of GPU tests (like
Stephen Martinis089f5f02019-02-12 02:42:24187 maps_pixel_test) as well as the test machines (e.g. mac-rel). If you
188 see a flaky test, file a bug like [this one](http://crbug.com/444430).
189 Also look for compile flakes that may indicate that a bot needs to be
190 clobbered. Contact the Chromium sheriffs or troopers if so.
Kai Ninomiyaa6429fb32018-03-30 01:30:56191 1. Glance at these trybots from time to time and see if any GPU tests are
192 failing frequently. **Note** that test failures are **expected** on
193 these bots: individuals' patches may fail to apply, fail to compile, or
194 break various tests. Look specifically for patterns in the failures. It
195 isn't necessary to spend a lot of time investigating each individual
196 failure. (Use the "Show: 200" link at the bottom of the page to see
197 more history.)
198 1. If the same set of tests are failing repeatedly, look at the individual
199 runs. Examine the swarming results and see whether they're all running
200 on the same machine. (This is the "Bot assigned to task" when clicking
201 any of the test's shards in the build logs.) If they are, something
202 might be wrong with the hardware. Use the [Swarming Server Stats] tool
203 to drill down into the specific builder.
204 1. If you see the same test failing in a flaky manner across multiple
205 machines and multiple CLs, it's crucial to investigate why it's
206 happening. [crbug.com/395914](http://crbug.com/395914) was one example
207 of an innocent-looking Blink change which made it through the commit
208 queue and introduced widespread flakiness in a range of GPU tests. The
209 failures were also most visible on the try servers as opposed to the
210 main waterfalls.
2111. Check if any pixel test failures are actual failures or need to be
212 rebaselined.
Brian Sheedyc4650ad02019-07-29 17:31:38213 1. For a given build failing the pixel tests, look for either:
214 1. One or more links named `gold_triage_link for <test name>`. This will
Brian Sheedyfcb315e2019-09-26 21:56:30215 be the case if there are fewer than 10 links. If the test was run on
216 a trybot, the link will instead be named
217 `triage_link_for_entire_cl for <test name>` (the weird naming comes
218 with how the recipe processes and displays links).
Brian Sheedyc4650ad02019-07-29 17:31:38219 1. A single link named
220 `Too many artifacts produced to link individually, click for links`.
221 This will be the case if there are 10 or more links.
222 1. In either case, follow the link(s) to the triage page for the image the
223 failing test produced.
Brian Sheedyfcb315e2019-09-26 21:56:30224 1. If the test was run on a trybot, all the links will point to the same
225 page, which will be the triage page for every untriaged image
226 produced by the CL being tested.
Brian Sheedyc4650ad02019-07-29 17:31:38227 1. Ensure you are signed in to the Gold server the links take you to (both
228 @google.com and @chromium.org accounts work).
229 1. Triage images on those pages (typically by approving them, but you can
230 mark them as negative if it is an image that should not be produced). In
231 the case of a negative image, a bug should be filed on
232 [crbug](https://crbug.com) to investigate and fix the cause of that
233 particular image being produced, as future occurrences of it will cause
234 the test to fail. Such bugs should include the `Internals>GPU>Testing`
235 component and whatever component is suitable for the type of failing
236 test (likely `Blink>WebGL` or `Blink>Canvas`). The test should also be
237 marked as failing or skipped(see the item below on updating the
238 Telemetry-based test expectations) so that the test failure doesn't show
239 up as a builder failure. If the failure is consistent, prefer to skip
240 instead of mark as failing so that the failure links don't pile up. If
241 the failure occurs on the trybots, include the change to the
242 expectations in your CL.
243 1. Additional, less common triage steps for the pixel tests can be found in
244 [this section][gold less common failures] of the GPU Gold documentation.
Kai Ninomiyaa6429fb32018-03-30 01:30:562451. Update Telemetry-based test expectations if necessary.
246 1. Most of the GPU tests are run inside a full Chromium browser, launched
247 by Telemetry, rather than a Gtest harness. The tests and their
Rakib M. Hasan2046a052019-05-13 23:33:15248 expectations are contained in [src/content/test/gpu/gpu_tests/test_expectations] . See
249 for example <code>[webgl_conformance_expectations.txt]</code>,
behdad05fd6c62020-02-13 21:51:09250 <code>[gpu_process_expectations.txt]</code> and
251 <code>[pixel_expectations.txt]</code>.
Kai Ninomiyaa6429fb32018-03-30 01:30:56252 1. See the header of the file a list of modifiers to specify a bot
253 configuration. It is possible to specify OS (down to a specific
254 version, say, Windows 7 or Mountain Lion), GPU vendor
255 (NVIDIA/AMD/Intel), and a specific GPU device.
256 1. The key is to maintain the highest coverage: if you have to disable a
257 test, disable it only on the specific configurations it's failing. Note
258 that it is not possible to discern between Debug and Release
259 configurations.
260 1. Mark tests failing or skipped, which will suppress flaky failures, only
261 as a last resort. It is only really necessary to suppress failures that
262 are showing up on the GPU tryservers, since failing tests no longer
263 close the Chromium tree.
264 1. Please read the section on [stamping out flakiness] for motivation on
265 how important it is to eliminate flakiness rather than hiding it.
behdad05fd6c62020-02-13 21:51:09266 1. For failures of rendering_representative_perf_tests please refer to its
267 [instructions on updating expectations][rendering_representative_perf_tests].
Kai Ninomiyaa6429fb32018-03-30 01:30:562681. For the remaining Gtest-style tests, use the [`DISABLED_`
269 modifier][gtest-DISABLED] to suppress any failures if necessary.
270
Yuly Novikov5564c672020-07-21 19:34:46271[Sheriff-O-Matic]: https://sheriff-o-matic.appspot.com/chromium.gpu
Kai Ninomiyaa6429fb32018-03-30 01:30:56272[tree sheriffing page]: https://sites.google.com/a/chromium.org/dev/developers/tree-sheriffs
Stephen Martinis089f5f02019-02-12 02:42:24273[linux-rel]: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/linux-rel
Kai Ninomiyaa6429fb32018-03-30 01:30:56274[luci.chromium.try]: https://ci.chromium.org/p/chromium/g/luci.chromium.try/builders
Stephen Martinis089f5f02019-02-12 02:42:24275[mac-rel]: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/mac-rel
Kai Ninomiyaa6429fb32018-03-30 01:30:56276[tryserver.chromium.mac]: https://ci.chromium.org/p/chromium/g/tryserver.chromium.mac/builders
Stephen Martinis089f5f02019-02-12 02:42:24277[win7-rel]:
278https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win7-rel
Kai Ninomiyaa6429fb32018-03-30 01:30:56279[tryserver.chromium.win]: https://ci.chromium.org/p/chromium/g/tryserver.chromium.win/builders
280[Chromium Try Flakes]: http://chromium-try-flakes.appspot.com/
281<!-- TODO(kainino): link doesn't work, but is still included from chromium-swarm homepage so not removing it now -->
282[Swarming Server Stats]: https://chromium-swarm.appspot.com/stats
Brian Sheedyc4650ad02019-07-29 17:31:38283[gold less common failures]: gpu_pixel_testing_with_gold.md#Triaging-Less-Common-Failures
Kai Ninomiyaa6429fb32018-03-30 01:30:56284[Chrome Internal GPU Pixel Wrangling Instructions]: https://sites.google.com/a/google.com/client3d/documents/chrome-internal-gpu-pixel-wrangling-instructions
Rakib M. Hasan2046a052019-05-13 23:33:15285[src/content/test/gpu/gpu_tests/test_expectations]: https://chromium.googlesource.com/chromium/src/+/master/content/test/gpu/gpu_tests/test_expectations
286[webgl_conformance_expectations.txt]: https://chromium.googlesource.com/chromium/src/+/master/content/test/gpu/gpu_tests/test_expectations/webgl_conformance_expectations.txt
287[gpu_process_expectations.txt]: https://chromium.googlesource.com/chromium/src/+/master/content/test/gpu/gpu_tests/test_expectations/gpu_process_expectations.txt
288[pixel_expectations.txt]: https://chromium.googlesource.com/chromium/src/+/master/content/test/gpu/gpu_tests/test_expectations/pixel_expectations.txt
Kai Ninomiyaa6429fb32018-03-30 01:30:56289[stamping out flakiness]: gpu_testing.md#Stamping-out-Flakiness
290[gtest-DISABLED]: https://github.com/google/googletest/blob/master/googletest/docs/AdvancedGuide.md#temporarily-disabling-tests
behdadcf8139e2019-12-02 17:55:46291[rendering_representative_perf_tests]: ../testing/rendering_representative_perf_tests.md#Updating-Expectations
Kai Ninomiyaa6429fb32018-03-30 01:30:56292
293### When Bots Misbehave (SSHing into a bot)
294
2951. See the [Chrome Internal GPU Pixel Wrangling Instructions] for information
296 on ssh'ing in to the GPU bots.
297
298[Chrome Internal GPU Pixel Wrangling Instructions]: https://sites.google.com/a/google.com/client3d/documents/chrome-internal-gpu-pixel-wrangling-instructions
299
300### Reproducing WebGL conformance test failures locally
301
3021. From the buildbot build output page, click on the failed shard to get to
303 the swarming task page. Scroll to the bottom of the left panel for a
304 command to run the task locally. This will automatically download the build
305 and any other inputs needed.
3062. Alternatively, to run the test on a local build, pass the arguments
307 `--browser=exact --browser-executable=/path/to/binary` to
308 `content/test/gpu/run_gpu_integration_test.py`.
309 Also see the [telemetry documentation].
310
311[telemetry documentation]: https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/docs/run_benchmarks_locally.md
312
jonrosscdc726dc2020-02-04 22:19:17313## Modifying the GPU Pixel Wrangling Rotation
Kai Ninomiyaa6429fb32018-03-30 01:30:56314
jonrosscdc726dc2020-02-04 22:19:17315You may find yourself needing to modify the current rotation. Whether to extend
316the rotation, or if scheduling conflicts arise.
317
318For scheduling conflicts you can swap your shift with another wrangler. A good
319approach is to look at the rotation calendar, finding someone with nearby dates
320to yours. Reach out to them, as they will often be willing to swap.
321
322To actually modify the rotation:
323See the [Chrome Internal GPU Pixel Wrangling Instructions] for information.
Kai Ninomiyaa6429fb32018-03-30 01:30:56324
325[Chrome Internal GPU Pixel Wrangling Instructions]: https://sites.google.com/a/google.com/client3d/documents/chrome-internal-gpu-pixel-wrangling-instructions