blob: 3c548e089e4bdd97b14761e11e3090181810a73c [file] [log] [blame] [view]
asankaddd5dc22015-03-20 15:52:401# Chrome Network Bug Triage : Suggested Workflow
2
3[TOC]
4
asankaddd5dc22015-03-20 15:52:405## Identifying unlabeled network bugs on the tracker
6
davidbenbeccd432016-06-22 18:13:187* Look at new unconfirmed bugs since noon PST on the last triager's rotation.
asankaddd5dc22015-03-20 15:52:408 [Use this issue tracker
asankaa30864c2016-12-13 19:28:579 query](https://bugs.chromium.org/p/chromium/issues/list?q=status%3Aunconfirmed&sort=-id&num=1000).
asankaddd5dc22015-03-20 15:52:4010
eroman4cb6246c2016-02-23 04:00:2711* Read the title of the bug.
asankaddd5dc22015-03-20 15:52:4012
asankaa30864c2016-12-13 19:28:5713* If a bug looks like it might be network related, middle click (or
14 command-click on OSX) to open it in a new tab.
asankaddd5dc22015-03-20 15:52:4015
16* If a user provides a crash ID for a crasher for a bug that could be
17 net-related, look at the crash stack at
18 [go/crash](https://goto.google.com/crash), and see if it looks to be network
19 related. Be sure to check if other bug reports have that stack trace, and
20 mark as a dupe if so. Even if the bug isn't network related, paste the stack
21 trace in the bug, so no one else has to look up the crash stack from the ID.
mmenke212fe432016-03-10 16:51:3322 * If there's just a blank form and a crash ID, just ignore the bug.
asankaddd5dc22015-03-20 15:52:4023
24* If network causes are possible, ask for a net-internals log (If it's not a
25 browser crash) and attach the most specific internals-network label that's
eroman96211952016-02-22 21:42:0326 applicable. If there isn't an applicable narrower component, a clear owner
27 for the issue, or there are multiple possibilities, attach the
28 Internals>Network component and proceed with further investigation.
asankaddd5dc22015-03-20 15:52:4029
eroman96211952016-02-22 21:42:0330* If non-network causes also seem possible, attach those components as well.
asankaddd5dc22015-03-20 15:52:4031
rdsmith937fba82016-03-29 21:17:4932## Investigate UMA notifications
33
34For each alert that fires, determine if it's a real alert and file a bug if so.
35
36* Don't file if the alert is coincident with a major volume change. The volume
37 at a particular date can be determined by hovering the mouse over the
38 appropriate location on the alert line.
39
40* Don't file if the alert is on a graph with very low volume (< ~200 data
41 points); it's probably noise, and we probably don't care even if it isn't.
42
43* Don't file if the graph is really noisy (but eyeball it to decide if there is
44 an underlying important shift under the noise).
45
46* Don't file if the alert is in the "Known Ignorable" list:
47 * SimpleCache on Windows
48 * DiskCache on Android.
49
eroman96211952016-02-22 21:42:0350## Investigating component=Internals>Network bugs
asankaddd5dc22015-03-20 15:52:4051
rdsmith27684e692017-04-28 21:23:4452* Note that you may want to investigate Needs-Feedback bugs first, as
53 that may result in some bugs being added to this list.
54
asankaddd5dc22015-03-20 15:52:4055* It's recommended that while on triage duty, you subscribe to the
eroman12372dc2016-02-22 20:08:1056 Internals>Network component (but not its subcomponents). To do this, go
57 to the issue tracker and then click "Saved Queries".
58 Add a query with these settings:
eroman4cb6246c2016-02-23 04:00:2759 * Saved query name: Network Bug Triage
60 * Project: chromium
61 * Query: component=Internals>Network
62 * Subscription options: Notify Immediately
asankaddd5dc22015-03-20 15:52:4063
davidbenbeccd432016-06-22 18:13:1864* Look through unconfirmed and untriaged component=Internals>Network bugs,
asankaddd5dc22015-03-20 15:52:4065 prioritizing those updated within the last week. [Use this issue tracker
mmenke212fe432016-03-10 16:51:3366 query](https://bugs.chromium.org/p/chromium/issues/list?can=2&q=component%3DInternals%3ENetwork+status%3AUnconfirmed,Untriaged+-label:Needs-Feedback&sort=-modified).
asankaddd5dc22015-03-20 15:52:4067
68* If more information is needed from the reporter, ask for it and add the
mmenke212fe432016-03-10 16:51:3369 Needs-Feedback label.
asankaddd5dc22015-03-20 15:52:4070
71* While investigating a new issue, change the status to Untriaged.
72
73* If a bug is a potential security issue (Allows for code execution from remote
74 site, allows crossing security boundaries, unchecked array bounds, etc) mark
75 it Type-Bug-Security. If it has privacy implication (History, cookies
76 discoverable by an entity that shouldn't be able to do so, incognito state
77 being saved in memory or on disk beyond the lifetime of incognito tabs, etc),
eroman96211952016-02-22 21:42:0378 mark it with component Privacy.
asankaddd5dc22015-03-20 15:52:4079
eroman96211952016-02-22 21:42:0380* For bugs that already have a more specific network component, go ahead and
mmenke212fe432016-03-10 16:51:3381 remove the Internals>Network component to get them off the next triager's
82 radar and move on.
asankaddd5dc22015-03-20 15:52:4083
84* Try to figure out if it's really a network bug. See common non-network
eroman96211952016-02-22 21:42:0385 components section for description of common components for issues incorrectly
86 tagged as Internals>Network.
asankaddd5dc22015-03-20 15:52:4087
eroman96211952016-02-22 21:42:0388* If it's not, attach appropriate labels/components and go no further.
asankaddd5dc22015-03-20 15:52:4089
eroman96211952016-02-22 21:42:0390* If it may be a network bug, attach additional possibly relevant component if
asankaddd5dc22015-03-20 15:52:4091 any, and continue investigating. Once you either determine it's a
eroman96211952016-02-22 21:42:0392 non-network bug, or figure out accurate more specific network components, your
asankaddd5dc22015-03-20 15:52:4093 job is done, though you should still ask for a net-internals dump if it seems
94 likely to be useful.
95
derat817105082017-02-22 17:57:5596* Note that Chrome-OS-specific network-related code (Captive portal detection,
asankaddd5dc22015-03-20 15:52:4097 connectivity detection, login, etc) may not all have appropriate more
eroman96211952016-02-22 21:42:0398 specific subcomponents, but are not in areas handled by the network stack
99 team. Just make sure those have the OS-Chrome label, and any more specific
100 labels if applicable, and then move on.
asankaddd5dc22015-03-20 15:52:40101
102* Gather data and investigate.
103 * Remember to add the Needs-Feedback label whenever waiting for the user to
104 respond with more information, and remove it when not waiting on the
105 user.
106 * Try to reproduce locally. If you can, and it's a regression, use
107 src/tools/bisect-builds.py to figure out when it regressed.
108 * Ask more data from the user as needed (net-internals dumps, repro case,
109 crash ID from about:crashes, run tests, etc).
110 * If asking for an about:net-internals dump, provide this link:
111 https://sites.google.com/a/chromium.org/dev/for-testers/providing-network-details.
112 Can just grab the link from about:net-internals, as needed.
113
eroman96211952016-02-22 21:42:03114* Try to figure out what's going on, and which more specific network component
115 is most appropriate.
asankaddd5dc22015-03-20 15:52:40116
117* If it's a regression, browse through the git history of relevant files to try
118 and figure out when it regressed. CC authors / primary reviewers of any
119 strongly suspect CLs.
120
121* If you are having trouble with an issue, particularly for help understanding
122 net-internals logs, email the public [email protected] list for help
123 debugging. If it's a crasher, or for some other reason discussion needs to
124 be done in private, use chrome-network-debugging@google.com. TODO(mmenke):
125 Write up a net-internals tips and tricks docs.
126
127* If it appears to be a bug in the unowned core of the network stack (i.e. no
eroman96211952016-02-22 21:42:03128 subcomponent applies, or only the Internals>Network>HTTP subcomponent
129 applies, and there's no clear owner), try to figure out the exact cause.
asankaddd5dc22015-03-20 15:52:40130
mmenke212fe432016-03-10 16:51:33131## Looking for new crashers
132
1331. Go to [go/chromecrash](https://goto.google.com/chromecrash).
134
1352. For each platform, look through the releases for which releases to
davidben8ec933c2016-04-21 17:13:35136 investigate. As per [bug-triage.md](bug-triage.md), this should be the most
137 recent canary, the previous canary (if the most recent is less than a day
138 old), and any of dev/beta/stable that were released in the last couple of
139 days.
mmenke212fe432016-03-10 16:51:33140
1413. For each release, in the "Process Type" frame, click on "browser".
142
1434. At the bottom of the "Magic Signature" frame, click "limit 1000" (Or reduce
144 the limit to 100 first, as that's all the triager needs to look at).
145 Reported crashers are sorted in decreasing order of the number of reports for
146 that crash signature.
147
1485. Search the page for *"net::"*.
149
1506. For each found signature:
151 * Ignore signatures that only occur once or twice, as memory corruption can
152 easily cause one-off failures when the sample size is large enough. Also
153 ignore crashers that are not in the top 100 for that platform / release.
154 * If there is a bug already filed, make sure it is correctly describing the
155 current bug (e.g. not closed, or not describing a long-past issue), and
156 make sure that if it is a *net* bug, that it is labeled as such.
157 * Ignore signatures that only come from one or two client IDs, as individual
158 machine malware and breakage can cause one-off failures.
159 * Click on the number of reports field to see details of crash. Ignore it
160 if it doesn't appear to be a network bug.
161 * Otherwise, file a new bug directly from chromecrash.
162 * For each bug you file, include the following information:
163 * The backtrace. Note that the backtrace should not be added to the
164 bug if Restrict-View-Google isn't set on the bug as it may contain
165 PII. Filing the bug from the crash reporter should do this
166 automatically, but check.
167 * The channel in which the bug is seen (canary/dev/beta/stable), and its
168 rank among crashers in the channel.
169 * The frequency of this signature in recent releases. This information
170 is available by:
171 1. Clicking on the signature in the "Magic Signature" list
172 2. Clicking "Edit" on the dremel query at the top of the page
173 3. Removing the "product.version='X.Y.Z.W' AND" string and clicking
174 "Update".
175 4. Clicking "Limit 1000" in the Product Version list in the
176 resulting page (without this, the listing will be restricted to
177 the releases in which the signature is most common, which will
178 often not include the canary/dev release being investigated).
179 5. Choose some subset of that list, or all of it, to include in the
180 bug. Make sure to indicate if there is a defined point in the
181 past before which the signature is not present.
asankaddd5dc22015-03-20 15:52:40182
rdsmith27684e692017-04-28 21:23:44183As an alternative to the above, you can use [Eric Roman's new crash
184tool](https://ericroman.users.x20web.corp.google.com/www/net-crash-triage/index.html)
185(internal link). Note that it isn't a perfect fit with the triage
186responsibilities, specifically:
187
188* It's only showing Windows releases; Android, iOS, and WebView are
189 usually different, and Mac is sometimes different.
190* The instructions are to look at the latest canary which has a days
191 worth of data. If canaries are being pushed fast, that may be more
192 than one canary into the past, and hence not visible on the tool.
193* Eric's tool filters based on files in "src/net" rather than looking
194 for magic signature's including the string "net::" ("src/net" is
195 probably the better filter).
196
asankaddd5dc22015-03-20 15:52:40197## Investigating crashers
198
199* Only investigate crashers that are still occurring, as identified by above
200 section. If a search on go/crash indicates a crasher is no longer occurring,
201 mark it as WontFix.
202
mmenke9ccb0de2015-04-23 16:11:11203* On Windows, you may want to look for weird dlls associated with the crashes.
asankaa30864c2016-12-13 19:28:57204 This generally needs crashes from a fair number of different users to reach
205 any conclusions.
mmenke9ccb0de2015-04-23 16:11:11206 * To get a list of loaded modules in related crash dumps, select
207 modules->3rd party in the left pane. It can be difficult to distinguish
208 between safe dlls and those likely to cause problems, but even if you're
209 not that familiar with windows, some may stick out. Anti-virus programs,
210 download managers, and more gray hat badware often have meaningful dll
211 names or dll paths (Generally product names or company names). If you
212 see one of these in a significant number of the crash dumps, it may well
213 be the cause.
214 * You can also try selecting the "has malware" option, though that's much
215 less reliable than looking manually.
asankaddd5dc22015-03-20 15:52:40216
217* See if the same users are repeatedly running into the same issue. This can
218 be accomplished by search for (Or clicking on) the client ID associated with
219 a crash report, and seeing if there are multiple reports for the same crash.
220 If this is the case, it may be also be malware, or an issue with an unusual
221 system/chrome/network config.
222
223* Dig through crash reports to figure out when the crash first appeared, and
224 dig through revision history in related files to try and locate a suspect CL.
225 TODO(mmenke): Add more detail here.
226
227* Load crash dumps, try to figure out a cause. See
228 http://www.chromium.org/developers/crash-reports for more information