Blame - docs/infra/cq.md - chromium/src

blob: 4249dd87948ad3d6ebfa380cace18bc4c38a01d0 [file] [log] [blame] [view]

Stephen Martinis	b5ad5b22	2018-11-08 01:24:04	[diff] [blame]	1	# CQ
				2
				3	This document describes how the Chromium Commit Queue (CQ) is structured and
				4	managed. This is specific for the Chromium CQ. Questions about other CQs should
				5	be directed to infra-dev@chromium.org.
				6
				7	[TOC]
				8
				9	## Purpose
				10
				11	The Chromium CQ exists to test developer changes before they land into
				12	[chromium/src](https://chromium.googlesource.com/chromium/src/). It runs all the
				13	test suites which a given CL affects, and ensures that they all pass.
				14
				15	## Options
				16
John Budorick	20db7c9	2019-08-20 19:30:59	[diff] [blame]	17	The Chromium CQ supports a variety of options that can change what it checks.
				18
				19	> These options are supported via git footers. They must appear in the last
				20	> paragraph of your commit message to be used. See `git help footers` or
				21	> [git_footers.py][1] for more information.
				22
				23	* `Commit: false`
Stephen Martinis	b5ad5b22	2018-11-08 01:24:04	[diff] [blame]	24
				25	You can mark a CL with this if you are working on experimental code and do not
				26	want to risk accidentally submitting it via the CQ. The CQ will immediately
				27	stop processing the change if it contains this option.
				28
John Budorick	20db7c9	2019-08-20 19:30:59	[diff] [blame]	29	* `Cq-Include-Trybots: <trybots>`
Stephen Martinis	b5ad5b22	2018-11-08 01:24:04	[diff] [blame]	30
				31	This flag allows you to specify some additional bots to run for this CL, in
				32	addition to the default bots. The format for the list of trybots is
				33	"bucket:trybot1,trybot2;bucket2:trybot3".
				34
John Budorick	20db7c9	2019-08-20 19:30:59	[diff] [blame]	35	* `No-Presubmit: true`
Stephen Martinis	b5ad5b22	2018-11-08 01:24:04	[diff] [blame]	36
				37	If you want to skip the presubmit check, you can add this line, and the commit
				38	queue won't run the presubmit for your change. This should only be used when
				39	there's a bug in the PRESUBMIT scripts. Please check that there's a bug filed
				40	against the bad script, and if there isn't, [file one](https://crbug.com/new).
				41
John Budorick	20db7c9	2019-08-20 19:30:59	[diff] [blame]	42	* `No-Tree-Checks: true`
Stephen Martinis	b5ad5b22	2018-11-08 01:24:04	[diff] [blame]	43
				44	Add this line if you want to skip the tree status checks. This means the CQ
				45	will commit a CL even if the tree is closed. Obviously this is strongly
				46	discouraged, since the tree is usually closed for a reason. However, in rare
				47	cases this is acceptable, primarily to fix build breakages (i.e., your CL will
				48	help in reopening the tree).
				49
John Budorick	20db7c9	2019-08-20 19:30:59	[diff] [blame]	50	* `No-Try: true`
Stephen Martinis	b5ad5b22	2018-11-08 01:24:04	[diff] [blame]	51
				52	This should only be used for reverts to green the tree, since it skips try
				53	bots and might therefore break the tree. You shouldn't use this otherwise.
				54
John Budorick	20db7c9	2019-08-20 19:30:59	[diff] [blame]	55	* `Tbr: <username>`
Stephen Martinis	b5ad5b22	2018-11-08 01:24:04	[diff] [blame]	56
				57	[See policy](https://chromium.googlesource.com/chromium/src/+/master/docs/code_reviews.md#TBR-To-Be-Reviewed)
				58	of when it's acceptable to use TBR ("To be reviewed"). If a change has a TBR
				59	line with a valid reviewer, the CQ will skip checks for LGTMs.
				60
				61	## FAQ
				62
Erik Chen	3eabe5e	2019-05-30 23:23:25	[diff] [blame]	63	### What exactly does the CQ run?
Stephen Martinis	b5ad5b22	2018-11-08 01:24:04	[diff] [blame]	64
John Budorick	20db7c9	2019-08-20 19:30:59	[diff] [blame]	65	CQ runs the jobs specified in [commit-queue.cfg][2]. See
Garrett Beaty	8928f390	2019-10-16 22:41:09	[diff] [blame^]	66	[`cq-builders.md`](https://chromium.googlesource.com/chromium/src/+/master/src/infra/config/generated/cq-builders.md)
				67	for an auto generated file with links to information about the builders on the
				68	CQ.
Stephen Martinis	b5ad5b22	2018-11-08 01:24:04	[diff] [blame]	69
				70	Some of these jobs are experimental. This means they are executed on a
				71	percentage of CQ builds, and the outcome of the build doesn't affect if the CL
				72	can land or not. See the schema linked at the top of the file for more
				73	information on what the fields in the config do.
				74
Erik Chen	3eabe5e	2019-05-30 23:23:25	[diff] [blame]	75	The CQ has the following structure:
				76
				77	* Compile all test suites that might be affected by the CL.
				78	* Runs all test suites that might be affected by the CL.
				79	* Many test suites are divided into shards. Each shard is run as a separate
				80	swarming task.
				81	* These steps are labeled '(with patch)'
				82	* Retry each shard that has a test failure. The retry has the exact same
				83	configuration as the original run. No recompile is necessary.
				84	* If the retry succeeds, then the failure is ignored.
				85	* These steps are labeled '(retry shards with patch)'
				86	* It's important to retry with the exact same configuration. Attempting to
				87	retry the failing test in isolation often produces different behavior.
				88	* Recompile each failing test suite without the CL. Rerun each failing test
				89	suite in isolation.
				90	* If the retry fails, then the fail is ignored, as it's assumed that the test
				91	is broken/flaky on tip of tree.
				92	* These steps are labeled '(without patch)'
				93	* Fail the build if there are tests which failed in both '(with patch)' and
				94	'(retry shards with patch)' but passed in '(without patch)'.
				95
Stephen Martinis	b5ad5b22	2018-11-08 01:24:04	[diff] [blame]	96	### Why did my CL fail the CQ?
				97
				98	Please follow these general guidelines:
Erik Chen	3eabe5e	2019-05-30 23:23:25	[diff] [blame]	99
Stephen Martinis	b5ad5b22	2018-11-08 01:24:04	[diff] [blame]	100	1. Check to see if your patch caused the build failures, and fix if possible.
				101	1. If compilation or individual tests are failing on one or more CQ bots and you
				102	suspect that your CL is not responsible, please contact your friendly
				103	neighborhood sheriff by filing a
				104	[sheriff bug](https://bugs.chromium.org/p/chromium/issues/entry?template=Defect%20report%20from%20developer&labels=Sheriff-Chromium&summary=%5BBrief%20description%20of%20problem%5D&comment=What%27s%20wrong?).
				105	If the code in question has appropriate OWNERS, consider contacting or CCing
				106	them.
				107	1. If other parts of CQ bot execution (e.g. `bot_update`) are failing, or you
				108	have reason to believe the CQ itself is broken, or you can't really
Sven Zheng	58e18fb	2019-01-22 19:00:00	[diff] [blame]	109	tell what's wrong, please file a [trooper bug](https://g.co/bugatrooper).
Stephen Martinis	b5ad5b22	2018-11-08 01:24:04	[diff] [blame]	110
				111	In both cases, when filing bugs, please include links to the build and/or CL
				112	(including relevant patchset information) in question.
				113
				114	### How do I add a new builder to the CQ?
				115
				116	There are several requirements for a builder to be added to the Commit Queue.
				117
				118	* All the code for this configuration must be in Chromium's public repository or
				119	brought in through [src/DEPS](../../DEPS).
				120	* Setting up the build should be straightforward for a Chromium developer
				121	familiar with existing configurations.
				122	* Tests should use existing test harnesses i.e.
				123	[gtest](../../third_party/googletest).
				124	* It should be possible for any committer to replicate any testing run; i.e.
				125	tests and their data must be in the public repository.
				126	* Median cycle time needs to be under 40 minutes for trybots. 90th percentile
				127	should be around an hour (preferrably shorter).
				128	* Configurations need to catch enough failures to be worth adding to the CQ.
				129	Running builds on every CL requires a significant amount of compute resources.
				130	If a configuration only fails once every couple of weeks on the waterfalls,
				131	then it's probably not worth adding it to the commit queue.
				132
				133	Please email [email protected], who will approve new build configurations.
				134
John Budorick	20db7c9	2019-08-20 19:30:59	[diff] [blame]	135	### How do I ensure a trybot runs on all changes to a specific directory?
				136
				137	Several builders are included in the CQ only for changes that affect specific
				138	directories. These used to be configured via Cq-Include-Trybots footers
				139	injected at CL upload time. They are now configured via `location_regexp` fields
				140	in [commit-queue.cfg][2], e.g.
				141
				142	```
				143	builders {
				144	name: "chromium/try/my-specific-trybot"
				145	location_regexp: ".+/{+]/path/to/my/specific/directory/.+"
				146	}
				147	```
				148
Stephen Martinis	b5ad5b22	2018-11-08 01:24:04	[diff] [blame]	149	## Flakiness
				150
				151	The CQ can sometimes be flaky. Flakiness is when a test on the CQ fails, but
				152	should have passed (commonly known as a false negative). There are a few common
				153	causes of flaky tests on the CQ:
				154
				155	* Machine issues; weird system processes running, running out of disk space,
				156	etc...
				157	* Test issues; individual tests not being independent and relying on the order
				158	of tests being run, not mocking out network traffic or other real world
				159	interactions.
				160
Erik Chen	3eabe5e	2019-05-30 23:23:25	[diff] [blame]	161	The CQ mitigates flakiness by retrying failed tests. The core tradeoff in retry
				162	policy is that adding retries increases the probability that a flaky test will
				163	land on tip of tree sublinearly, but mitigates the impact of the flaky test on
				164	unrelated CLs exponentially.
Stephen Martinis	b5ad5b22	2018-11-08 01:24:04	[diff] [blame]	165
Erik Chen	3eabe5e	2019-05-30 23:23:25	[diff] [blame]	166	For example, imagine a CL that adds a test that fails with 50% probability. Even
				167	with no retries, the test will land with 50% probability. Subsequently, 50% of
				168	all unrelated CQ attempts would flakily fail. This effect is cumulative across
				169	different flaky tests. Since the CQ has roughly ~20,000 unique flaky tests,
				170	without retries, pretty much no CL would ever pass the CQ.
Stephen Martinis	b5ad5b22	2018-11-08 01:24:04	[diff] [blame]	171
				172	## Help!
				173
				174	Have other questions? Run into any issues with the CQ? Email
Sven Zheng	58e18fb	2019-01-22 19:00:00	[diff] [blame]	175	[email protected], or file a [trooper bug](https://g.co/bugatrooper).
John Budorick	20db7c9	2019-08-20 19:30:59	[diff] [blame]	176
				177
				178	[1]: https://chromium.googlesource.com/chromium/tools/depot_tools/+/HEAD/git_footers.py
				179	[2]: ../../infra/config/commit-queue.cfg