asanka | ddd5dc2 | 2015-03-20 15:52:40 | [diff] [blame] | 1 | # Chrome Network Bug Triage : Suggested Workflow |
| 2 | |
| 3 | [TOC] |
| 4 | |
asanka | ddd5dc2 | 2015-03-20 15:52:40 | [diff] [blame] | 5 | ## Identifying unlabeled network bugs on the tracker |
| 6 | |
davidben | beccd43 | 2016-06-22 18:13:18 | [diff] [blame] | 7 | * Look at new unconfirmed bugs since noon PST on the last triager's rotation. |
asanka | ddd5dc2 | 2015-03-20 15:52:40 | [diff] [blame] | 8 | [Use this issue tracker |
asanka | a30864c | 2016-12-13 19:28:57 | [diff] [blame] | 9 | query](https://bugs.chromium.org/p/chromium/issues/list?q=status%3Aunconfirmed&sort=-id&num=1000). |
asanka | ddd5dc2 | 2015-03-20 15:52:40 | [diff] [blame] | 10 | |
eroman | 4cb6246c | 2016-02-23 04:00:27 | [diff] [blame] | 11 | * Read the title of the bug. |
asanka | ddd5dc2 | 2015-03-20 15:52:40 | [diff] [blame] | 12 | |
asanka | a30864c | 2016-12-13 19:28:57 | [diff] [blame] | 13 | * If a bug looks like it might be network related, middle click (or |
| 14 | command-click on OSX) to open it in a new tab. |
asanka | ddd5dc2 | 2015-03-20 15:52:40 | [diff] [blame] | 15 | |
| 16 | * If a user provides a crash ID for a crasher for a bug that could be |
| 17 | net-related, look at the crash stack at |
| 18 | [go/crash](https://goto.google.com/crash), and see if it looks to be network |
| 19 | related. Be sure to check if other bug reports have that stack trace, and |
| 20 | mark as a dupe if so. Even if the bug isn't network related, paste the stack |
| 21 | trace in the bug, so no one else has to look up the crash stack from the ID. |
mmenke | 212fe43 | 2016-03-10 16:51:33 | [diff] [blame] | 22 | * If there's just a blank form and a crash ID, just ignore the bug. |
asanka | ddd5dc2 | 2015-03-20 15:52:40 | [diff] [blame] | 23 | |
| 24 | * If network causes are possible, ask for a net-internals log (If it's not a |
| 25 | browser crash) and attach the most specific internals-network label that's |
eroman | 9621195 | 2016-02-22 21:42:03 | [diff] [blame] | 26 | applicable. If there isn't an applicable narrower component, a clear owner |
| 27 | for the issue, or there are multiple possibilities, attach the |
| 28 | Internals>Network component and proceed with further investigation. |
asanka | ddd5dc2 | 2015-03-20 15:52:40 | [diff] [blame] | 29 | |
eroman | 9621195 | 2016-02-22 21:42:03 | [diff] [blame] | 30 | * If non-network causes also seem possible, attach those components as well. |
asanka | ddd5dc2 | 2015-03-20 15:52:40 | [diff] [blame] | 31 | |
rdsmith | 937fba8 | 2016-03-29 21:17:49 | [diff] [blame] | 32 | ## Investigate UMA notifications |
| 33 | |
| 34 | For each alert that fires, determine if it's a real alert and file a bug if so. |
| 35 | |
| 36 | * Don't file if the alert is coincident with a major volume change. The volume |
| 37 | at a particular date can be determined by hovering the mouse over the |
| 38 | appropriate location on the alert line. |
| 39 | |
| 40 | * Don't file if the alert is on a graph with very low volume (< ~200 data |
| 41 | points); it's probably noise, and we probably don't care even if it isn't. |
| 42 | |
| 43 | * Don't file if the graph is really noisy (but eyeball it to decide if there is |
| 44 | an underlying important shift under the noise). |
| 45 | |
| 46 | * Don't file if the alert is in the "Known Ignorable" list: |
| 47 | * SimpleCache on Windows |
| 48 | * DiskCache on Android. |
| 49 | |
eroman | 9621195 | 2016-02-22 21:42:03 | [diff] [blame] | 50 | ## Investigating component=Internals>Network bugs |
asanka | ddd5dc2 | 2015-03-20 15:52:40 | [diff] [blame] | 51 | |
rdsmith | 27684e69 | 2017-04-28 21:23:44 | [diff] [blame^] | 52 | * Note that you may want to investigate Needs-Feedback bugs first, as |
| 53 | that may result in some bugs being added to this list. |
| 54 | |
asanka | ddd5dc2 | 2015-03-20 15:52:40 | [diff] [blame] | 55 | * It's recommended that while on triage duty, you subscribe to the |
eroman | 12372dc | 2016-02-22 20:08:10 | [diff] [blame] | 56 | Internals>Network component (but not its subcomponents). To do this, go |
| 57 | to the issue tracker and then click "Saved Queries". |
| 58 | Add a query with these settings: |
eroman | 4cb6246c | 2016-02-23 04:00:27 | [diff] [blame] | 59 | * Saved query name: Network Bug Triage |
| 60 | * Project: chromium |
| 61 | * Query: component=Internals>Network |
| 62 | * Subscription options: Notify Immediately |
asanka | ddd5dc2 | 2015-03-20 15:52:40 | [diff] [blame] | 63 | |
davidben | beccd43 | 2016-06-22 18:13:18 | [diff] [blame] | 64 | * Look through unconfirmed and untriaged component=Internals>Network bugs, |
asanka | ddd5dc2 | 2015-03-20 15:52:40 | [diff] [blame] | 65 | prioritizing those updated within the last week. [Use this issue tracker |
mmenke | 212fe43 | 2016-03-10 16:51:33 | [diff] [blame] | 66 | query](https://bugs.chromium.org/p/chromium/issues/list?can=2&q=component%3DInternals%3ENetwork+status%3AUnconfirmed,Untriaged+-label:Needs-Feedback&sort=-modified). |
asanka | ddd5dc2 | 2015-03-20 15:52:40 | [diff] [blame] | 67 | |
| 68 | * If more information is needed from the reporter, ask for it and add the |
mmenke | 212fe43 | 2016-03-10 16:51:33 | [diff] [blame] | 69 | Needs-Feedback label. |
asanka | ddd5dc2 | 2015-03-20 15:52:40 | [diff] [blame] | 70 | |
| 71 | * While investigating a new issue, change the status to Untriaged. |
| 72 | |
| 73 | * If a bug is a potential security issue (Allows for code execution from remote |
| 74 | site, allows crossing security boundaries, unchecked array bounds, etc) mark |
| 75 | it Type-Bug-Security. If it has privacy implication (History, cookies |
| 76 | discoverable by an entity that shouldn't be able to do so, incognito state |
| 77 | being saved in memory or on disk beyond the lifetime of incognito tabs, etc), |
eroman | 9621195 | 2016-02-22 21:42:03 | [diff] [blame] | 78 | mark it with component Privacy. |
asanka | ddd5dc2 | 2015-03-20 15:52:40 | [diff] [blame] | 79 | |
eroman | 9621195 | 2016-02-22 21:42:03 | [diff] [blame] | 80 | * For bugs that already have a more specific network component, go ahead and |
mmenke | 212fe43 | 2016-03-10 16:51:33 | [diff] [blame] | 81 | remove the Internals>Network component to get them off the next triager's |
| 82 | radar and move on. |
asanka | ddd5dc2 | 2015-03-20 15:52:40 | [diff] [blame] | 83 | |
| 84 | * Try to figure out if it's really a network bug. See common non-network |
eroman | 9621195 | 2016-02-22 21:42:03 | [diff] [blame] | 85 | components section for description of common components for issues incorrectly |
| 86 | tagged as Internals>Network. |
asanka | ddd5dc2 | 2015-03-20 15:52:40 | [diff] [blame] | 87 | |
eroman | 9621195 | 2016-02-22 21:42:03 | [diff] [blame] | 88 | * If it's not, attach appropriate labels/components and go no further. |
asanka | ddd5dc2 | 2015-03-20 15:52:40 | [diff] [blame] | 89 | |
eroman | 9621195 | 2016-02-22 21:42:03 | [diff] [blame] | 90 | * If it may be a network bug, attach additional possibly relevant component if |
asanka | ddd5dc2 | 2015-03-20 15:52:40 | [diff] [blame] | 91 | any, and continue investigating. Once you either determine it's a |
eroman | 9621195 | 2016-02-22 21:42:03 | [diff] [blame] | 92 | non-network bug, or figure out accurate more specific network components, your |
asanka | ddd5dc2 | 2015-03-20 15:52:40 | [diff] [blame] | 93 | job is done, though you should still ask for a net-internals dump if it seems |
| 94 | likely to be useful. |
| 95 | |
derat | 81710508 | 2017-02-22 17:57:55 | [diff] [blame] | 96 | * Note that Chrome-OS-specific network-related code (Captive portal detection, |
asanka | ddd5dc2 | 2015-03-20 15:52:40 | [diff] [blame] | 97 | connectivity detection, login, etc) may not all have appropriate more |
eroman | 9621195 | 2016-02-22 21:42:03 | [diff] [blame] | 98 | specific subcomponents, but are not in areas handled by the network stack |
| 99 | team. Just make sure those have the OS-Chrome label, and any more specific |
| 100 | labels if applicable, and then move on. |
asanka | ddd5dc2 | 2015-03-20 15:52:40 | [diff] [blame] | 101 | |
| 102 | * Gather data and investigate. |
| 103 | * Remember to add the Needs-Feedback label whenever waiting for the user to |
| 104 | respond with more information, and remove it when not waiting on the |
| 105 | user. |
| 106 | * Try to reproduce locally. If you can, and it's a regression, use |
| 107 | src/tools/bisect-builds.py to figure out when it regressed. |
| 108 | * Ask more data from the user as needed (net-internals dumps, repro case, |
| 109 | crash ID from about:crashes, run tests, etc). |
| 110 | * If asking for an about:net-internals dump, provide this link: |
| 111 | https://sites.google.com/a/chromium.org/dev/for-testers/providing-network-details. |
| 112 | Can just grab the link from about:net-internals, as needed. |
| 113 | |
eroman | 9621195 | 2016-02-22 21:42:03 | [diff] [blame] | 114 | * Try to figure out what's going on, and which more specific network component |
| 115 | is most appropriate. |
asanka | ddd5dc2 | 2015-03-20 15:52:40 | [diff] [blame] | 116 | |
| 117 | * If it's a regression, browse through the git history of relevant files to try |
| 118 | and figure out when it regressed. CC authors / primary reviewers of any |
| 119 | strongly suspect CLs. |
| 120 | |
| 121 | * If you are having trouble with an issue, particularly for help understanding |
| 122 | net-internals logs, email the public [email protected] list for help |
| 123 | debugging. If it's a crasher, or for some other reason discussion needs to |
| 124 | be done in private, use chrome-network-debugging@google.com. TODO(mmenke): |
| 125 | Write up a net-internals tips and tricks docs. |
| 126 | |
| 127 | * If it appears to be a bug in the unowned core of the network stack (i.e. no |
eroman | 9621195 | 2016-02-22 21:42:03 | [diff] [blame] | 128 | subcomponent applies, or only the Internals>Network>HTTP subcomponent |
| 129 | applies, and there's no clear owner), try to figure out the exact cause. |
asanka | ddd5dc2 | 2015-03-20 15:52:40 | [diff] [blame] | 130 | |
mmenke | 212fe43 | 2016-03-10 16:51:33 | [diff] [blame] | 131 | ## Looking for new crashers |
| 132 | |
| 133 | 1. Go to [go/chromecrash](https://goto.google.com/chromecrash). |
| 134 | |
| 135 | 2. For each platform, look through the releases for which releases to |
davidben | 8ec933c | 2016-04-21 17:13:35 | [diff] [blame] | 136 | investigate. As per [bug-triage.md](bug-triage.md), this should be the most |
| 137 | recent canary, the previous canary (if the most recent is less than a day |
| 138 | old), and any of dev/beta/stable that were released in the last couple of |
| 139 | days. |
mmenke | 212fe43 | 2016-03-10 16:51:33 | [diff] [blame] | 140 | |
| 141 | 3. For each release, in the "Process Type" frame, click on "browser". |
| 142 | |
| 143 | 4. At the bottom of the "Magic Signature" frame, click "limit 1000" (Or reduce |
| 144 | the limit to 100 first, as that's all the triager needs to look at). |
| 145 | Reported crashers are sorted in decreasing order of the number of reports for |
| 146 | that crash signature. |
| 147 | |
| 148 | 5. Search the page for *"net::"*. |
| 149 | |
| 150 | 6. For each found signature: |
| 151 | * Ignore signatures that only occur once or twice, as memory corruption can |
| 152 | easily cause one-off failures when the sample size is large enough. Also |
| 153 | ignore crashers that are not in the top 100 for that platform / release. |
| 154 | * If there is a bug already filed, make sure it is correctly describing the |
| 155 | current bug (e.g. not closed, or not describing a long-past issue), and |
| 156 | make sure that if it is a *net* bug, that it is labeled as such. |
| 157 | * Ignore signatures that only come from one or two client IDs, as individual |
| 158 | machine malware and breakage can cause one-off failures. |
| 159 | * Click on the number of reports field to see details of crash. Ignore it |
| 160 | if it doesn't appear to be a network bug. |
| 161 | * Otherwise, file a new bug directly from chromecrash. |
| 162 | * For each bug you file, include the following information: |
| 163 | * The backtrace. Note that the backtrace should not be added to the |
| 164 | bug if Restrict-View-Google isn't set on the bug as it may contain |
| 165 | PII. Filing the bug from the crash reporter should do this |
| 166 | automatically, but check. |
| 167 | * The channel in which the bug is seen (canary/dev/beta/stable), and its |
| 168 | rank among crashers in the channel. |
| 169 | * The frequency of this signature in recent releases. This information |
| 170 | is available by: |
| 171 | 1. Clicking on the signature in the "Magic Signature" list |
| 172 | 2. Clicking "Edit" on the dremel query at the top of the page |
| 173 | 3. Removing the "product.version='X.Y.Z.W' AND" string and clicking |
| 174 | "Update". |
| 175 | 4. Clicking "Limit 1000" in the Product Version list in the |
| 176 | resulting page (without this, the listing will be restricted to |
| 177 | the releases in which the signature is most common, which will |
| 178 | often not include the canary/dev release being investigated). |
| 179 | 5. Choose some subset of that list, or all of it, to include in the |
| 180 | bug. Make sure to indicate if there is a defined point in the |
| 181 | past before which the signature is not present. |
asanka | ddd5dc2 | 2015-03-20 15:52:40 | [diff] [blame] | 182 | |
rdsmith | 27684e69 | 2017-04-28 21:23:44 | [diff] [blame^] | 183 | As an alternative to the above, you can use [Eric Roman's new crash |
| 184 | tool](https://ericroman.users.x20web.corp.google.com/www/net-crash-triage/index.html) |
| 185 | (internal link). Note that it isn't a perfect fit with the triage |
| 186 | responsibilities, specifically: |
| 187 | |
| 188 | * It's only showing Windows releases; Android, iOS, and WebView are |
| 189 | usually different, and Mac is sometimes different. |
| 190 | * The instructions are to look at the latest canary which has a days |
| 191 | worth of data. If canaries are being pushed fast, that may be more |
| 192 | than one canary into the past, and hence not visible on the tool. |
| 193 | * Eric's tool filters based on files in "src/net" rather than looking |
| 194 | for magic signature's including the string "net::" ("src/net" is |
| 195 | probably the better filter). |
| 196 | |
asanka | ddd5dc2 | 2015-03-20 15:52:40 | [diff] [blame] | 197 | ## Investigating crashers |
| 198 | |
| 199 | * Only investigate crashers that are still occurring, as identified by above |
| 200 | section. If a search on go/crash indicates a crasher is no longer occurring, |
| 201 | mark it as WontFix. |
| 202 | |
mmenke | 9ccb0de | 2015-04-23 16:11:11 | [diff] [blame] | 203 | * On Windows, you may want to look for weird dlls associated with the crashes. |
asanka | a30864c | 2016-12-13 19:28:57 | [diff] [blame] | 204 | This generally needs crashes from a fair number of different users to reach |
| 205 | any conclusions. |
mmenke | 9ccb0de | 2015-04-23 16:11:11 | [diff] [blame] | 206 | * To get a list of loaded modules in related crash dumps, select |
| 207 | modules->3rd party in the left pane. It can be difficult to distinguish |
| 208 | between safe dlls and those likely to cause problems, but even if you're |
| 209 | not that familiar with windows, some may stick out. Anti-virus programs, |
| 210 | download managers, and more gray hat badware often have meaningful dll |
| 211 | names or dll paths (Generally product names or company names). If you |
| 212 | see one of these in a significant number of the crash dumps, it may well |
| 213 | be the cause. |
| 214 | * You can also try selecting the "has malware" option, though that's much |
| 215 | less reliable than looking manually. |
asanka | ddd5dc2 | 2015-03-20 15:52:40 | [diff] [blame] | 216 | |
| 217 | * See if the same users are repeatedly running into the same issue. This can |
| 218 | be accomplished by search for (Or clicking on) the client ID associated with |
| 219 | a crash report, and seeing if there are multiple reports for the same crash. |
| 220 | If this is the case, it may be also be malware, or an issue with an unusual |
| 221 | system/chrome/network config. |
| 222 | |
| 223 | * Dig through crash reports to figure out when the crash first appeared, and |
| 224 | dig through revision history in related files to try and locate a suspect CL. |
| 225 | TODO(mmenke): Add more detail here. |
| 226 | |
| 227 | * Load crash dumps, try to figure out a cause. See |
| 228 | http://www.chromium.org/developers/crash-reports for more information |