Matt Menke | ae32850c | 2018-03-20 16:40:55 | [diff] [blame] | 1 | # A Crash Course in Debugging with chrome://net-internals |
mmenke | 429beb6c | 2015-12-15 03:11:18 | [diff] [blame] | 2 | |
| 3 | This document is intended to help get people started debugging network errors |
Matt Menke | ae32850c | 2018-03-20 16:40:55 | [diff] [blame] | 4 | with chrome://net-internals, with some commonly useful tips and tricks. This |
mmenke | 429beb6c | 2015-12-15 03:11:18 | [diff] [blame] | 5 | document is aimed more at how to get started using some of its features to |
| 6 | investigate bug reports, rather than as a feature overview. |
| 7 | |
mmenke | 212fe43 | 2016-03-10 16:51:33 | [diff] [blame] | 8 | It would probably be useful to read |
| 9 | [life-of-a-url-request.md](life-of-a-url-request.md) before this document. |
mmenke | 429beb6c | 2015-12-15 03:11:18 | [diff] [blame] | 10 | |
| 11 | # What Data Net-Internals Contains |
| 12 | |
Matt Menke | ae32850c | 2018-03-20 16:40:55 | [diff] [blame] | 13 | chrome://net-internals provides a view of browser activity from net/'s |
| 14 | perspective. For this reason, it lacks knowledge of tabs, navigation, frames, |
| 15 | resource types, etc. |
mmenke | 429beb6c | 2015-12-15 03:11:18 | [diff] [blame] | 16 | |
bnc | 16cf6923 | 2016-12-12 16:28:10 | [diff] [blame] | 17 | The leftmost column presents a list of views. Most debugging is done with the |
| 18 | Events view, which will be all this document covers. |
| 19 | |
| 20 | The top level network stack object is the URLRequestContext. The Events view |
mmenke | 429beb6c | 2015-12-15 03:11:18 | [diff] [blame] | 21 | has information for all Chrome URLRequestContexts that are hooked up to the |
Eric Roman | b67de83b | 2019-06-22 00:12:32 | [diff] [blame] | 22 | single, global, NetLog object. This includes both incognito and |
bnc | 16cf6923 | 2016-12-12 16:28:10 | [diff] [blame] | 23 | non-incognito profiles, among other things. The Events view only shows events |
| 24 | for the period that net-internals was open and running, and is incrementally |
| 25 | updated as events occur. The code attempts to add a top level event for |
Matt Menke | ae32850c | 2018-03-20 16:40:55 | [diff] [blame] | 26 | URLRequests that were active when the chrome://net-internals tab was opened, to |
bnc | 16cf6923 | 2016-12-12 16:28:10 | [diff] [blame] | 27 | help debug hung requests, but that's best-effort only, and only includes |
| 28 | requests for the current profile and the system URLRequestContext. |
mmenke | 429beb6c | 2015-12-15 03:11:18 | [diff] [blame] | 29 | |
| 30 | The other views are all snapshots of the current state of the main |
| 31 | URLRequestContext's components, and are updated on a 5 second timer. These will |
Matt Menke | ae32850c | 2018-03-20 16:40:55 | [diff] [blame] | 32 | show objects that were created before chrome://net-internals was opened. |
mmenke | 429beb6c | 2015-12-15 03:11:18 | [diff] [blame] | 33 | |
| 34 | # Events vs Sources |
| 35 | |
bnc | 16cf6923 | 2016-12-12 16:28:10 | [diff] [blame] | 36 | The Events view shows events logged by the NetLog. The NetLog model is that |
mmenke | 429beb6c | 2015-12-15 03:11:18 | [diff] [blame] | 37 | long-lived network stack objects, called sources, emit events over their |
bnc | 16cf6923 | 2016-12-12 16:28:10 | [diff] [blame] | 38 | lifetime. A NetLogWithSource object contains a source ID, a NetLogSourceType, |
| 39 | and a pointer to the NetLog the source emits events to. |
| 40 | |
| 41 | The Events view has a list of sources in a column adjacent to the list of views. |
| 42 | Sources that include an event with a net_error parameter with negative value |
| 43 | (that is, some kind of ERR_) are shown with red background. Sources whose |
| 44 | opening event has not ended yet are shown with white background. Other events |
| 45 | have green background. The search queries corresponding to the first two kinds |
| 46 | are `is:error` and `is:active`. |
| 47 | |
| 48 | When one or more sources are selected, corresponding events show up in another |
| 49 | column to the right, sorted by source, and by time within each source. There |
| 50 | are two time values: t is measured from some reference point common to all |
| 51 | sources, and st is measured from the first event for each source. Time is |
| 52 | displayed in milliseconds. |
| 53 | |
| 54 | Since the network stack is asynchronous, events from different sources will |
| 55 | often be interlaced in time, but Events view does not feature showing events from |
| 56 | different sources ordered by time. Large time gaps in the event list of a |
| 57 | single source usually mean that time is spent in the context of another source. |
| 58 | |
| 59 | Some events come in pairs: a beginning and end event, between which other events |
| 60 | may occur. They are shown with + and - prefixes, respectively. The begin event |
| 61 | has a dt value which shows the duration. If the end event was captured, then |
| 62 | duration is calculated as the time difference between the begin and the end |
| 63 | events. Otherwise the time elapsed from the begin event until capturing |
| 64 | was stopped is displayed (a lower bound for actual duration), followed by a + |
| 65 | sign (for example, "dt=120+"). |
| 66 | |
| 67 | If there are no other events in between the begin and end, and the end event has |
| 68 | no parameters, then they are collapsed in a single line without a sign prefix. |
| 69 | |
| 70 | Some other events only occur at a single point in time, and will not have either |
| 71 | a sign prefix, or a dt duration value. |
| 72 | |
| 73 | Generally only one event can be occuring for a source at a time. If there can |
| 74 | be multiple events doing completely independent things, the code often uses new |
| 75 | sources to represent the parallelism. |
| 76 | |
| 77 | Most, but not all events correspond to a source. Exceptions are global events, |
| 78 | which have no source, and show up as individual entries in the source list. |
| 79 | Examples of global events include NETWORK_CHANGED, DNS_CONFIG_CHANGED, and |
| 80 | PROXY_CONFIG_CHANGED. |
| 81 | |
| 82 | # Common source types |
mmenke | 429beb6c | 2015-12-15 03:11:18 | [diff] [blame] | 83 | |
| 84 | "Sources" correspond to certain net objects, however, multiple layers of net/ |
| 85 | will often log to a single source. Here are the main source types and what they |
bnc | 16cf6923 | 2016-12-12 16:28:10 | [diff] [blame] | 86 | include (excluding HTTP2 [SPDY]/QUIC): |
mmenke | 429beb6c | 2015-12-15 03:11:18 | [diff] [blame] | 87 | |
| 88 | * URL_REQUEST: This corresponds to the URLRequest object. It includes events |
| 89 | from all the URLRequestJobs, HttpCache::Transactions, NetworkTransactions, |
David Benjamin | 4aa6b2a | 2019-01-09 16:38:00 | [diff] [blame] | 90 | HttpStreamRequests, HttpStream implementations, and HttpStreamParsers used to |
| 91 | service a response. If the URL_REQUEST follows HTTP redirects, it will include |
| 92 | each redirect. This is a lot of stuff, but generally only one object is doing |
| 93 | work at a time. This event source includes the full URL and generally includes |
| 94 | the request / response headers (except when the cache handles the response). |
mmenke | 429beb6c | 2015-12-15 03:11:18 | [diff] [blame] | 95 | |
David Benjamin | 4aa6b2a | 2019-01-09 16:38:00 | [diff] [blame] | 96 | * HTTP_STREAM_JOB: This corresponds to HttpStreamFactory::Job (note that one |
| 97 | Request can have multiple Jobs). It also includes its proxy and DNS lookups. |
| 98 | HTTP_STREAM_JOB log events are separate from URL_REQUEST because two stream |
| 99 | jobs may be created and races against each other, in some cases -- one for |
| 100 | QUIC, and one for HTTP. |
bnc | 16cf6923 | 2016-12-12 16:28:10 | [diff] [blame] | 101 | |
| 102 | One of the final events of this source, before the |
| 103 | HTTP_STREAM_JOB_BOUND_TO_REQUEST event, indicates how an HttpStream was |
| 104 | created: |
| 105 | |
| 106 | + A SOCKET_POOL_BOUND_TO_CONNECT_JOB event means that a new TCP socket was |
| 107 | created, whereas a SOCKET_POOL_REUSED_AN_EXISTING_SOCKET event indicates that |
| 108 | an existing TCP socket was reused for a non-HTTP/2 request. |
| 109 | |
| 110 | + An HTTP2_SESSION_POOL_IMPORTED_SESSION_FROM_SOCKET event indicates that a |
| 111 | new HTTP/2 session was opened by this Job. |
| 112 | |
| 113 | + An HTTP2_SESSION_POOL_FOUND_EXISTING_SESSION event indicates that the request |
| 114 | was served on a preexisting HTTP/2 session. |
| 115 | |
| 116 | + An HTTP2_SESSION_POOL_FOUND_EXISTING_SESSION_FROM_IP_POOL event means that |
| 117 | the request was pooled to a preexisting HTTP/2 session which had a different |
| 118 | SpdySessionKey, but DNS resolution resulted in the same IP, and the |
| 119 | certificate matches. |
| 120 | |
| 121 | + There are currently no events logged for opening new QUIC sessions or |
| 122 | reusing existing ones. |
mmenke | 429beb6c | 2015-12-15 03:11:18 | [diff] [blame] | 123 | |
davidben | b7048f09 | 2016-11-30 21:20:26 | [diff] [blame] | 124 | * \*_CONNECT_JOB: This corresponds to the ConnectJob subclasses that each socket |
bnc | 16cf6923 | 2016-12-12 16:28:10 | [diff] [blame] | 125 | pool uses. A successful CONNECT_JOB returns a SOCKET. The events here vary a |
mmenke | 429beb6c | 2015-12-15 03:11:18 | [diff] [blame] | 126 | lot by job type. Their main event is generally either to create a socket, or |
bnc | 16cf6923 | 2016-12-12 16:28:10 | [diff] [blame] | 127 | request a socket from another socket pool (which creates another CONNECT_JOB) |
| 128 | and then do some extra work on top of that -- like establish an SSL connection on |
mmenke | 429beb6c | 2015-12-15 03:11:18 | [diff] [blame] | 129 | top of a TCP connection. |
| 130 | |
| 131 | * SOCKET: These correspond to TCPSockets, but may also have other classes |
bnc | 16cf6923 | 2016-12-12 16:28:10 | [diff] [blame] | 132 | layered on top of them (like an SSLClientSocket). This is a bit different from |
mmenke | 429beb6c | 2015-12-15 03:11:18 | [diff] [blame] | 133 | the other classes, where the name corresponds to the topmost class, instead of |
| 134 | the bottommost one. This is largely an artifact of the fact the socket is |
bnc | 16cf6923 | 2016-12-12 16:28:10 | [diff] [blame] | 135 | created first, and then SSL (or a proxy connection) is layered on top of it. |
mmenke | 429beb6c | 2015-12-15 03:11:18 | [diff] [blame] | 136 | SOCKETs may be reused between multiple requests, and a request may end up |
| 137 | getting a socket created for another request. |
| 138 | |
bnc | 16cf6923 | 2016-12-12 16:28:10 | [diff] [blame] | 139 | * HOST_RESOLVER_IMPL_JOB: These correspond to HostResolverImpl::Job. They |
mmenke | 429beb6c | 2015-12-15 03:11:18 | [diff] [blame] | 140 | include information about how long the lookup was queued, each DNS request that |
bnc | 16cf6923 | 2016-12-12 16:28:10 | [diff] [blame] | 141 | was attempted (with the platform or built-in resolver) and all the other sources |
mmenke | 429beb6c | 2015-12-15 03:11:18 | [diff] [blame] | 142 | that are waiting on the job. |
| 143 | |
bnc | 16cf6923 | 2016-12-12 16:28:10 | [diff] [blame] | 144 | When one source depends on another, the code generally logs an event at both |
| 145 | sources with a `source_dependency` value pointing to the other source. These |
| 146 | are clickable in the UI, adding the referred source to the list of selected |
| 147 | sources. |
mmenke | 429beb6c | 2015-12-15 03:11:18 | [diff] [blame] | 148 | |
| 149 | # Debugging |
| 150 | |
| 151 | When you receive a report from the user, the first thing you'll generally want |
| 152 | to do find the URL_REQUEST[s] that are misbehaving. If the user gives an ERR_* |
| 153 | code or the exact URL of the resource that won't load, you can just search for |
| 154 | it. If it's an upload, you can search for "post", or if it's a redirect issue, |
| 155 | you can search for "redirect". However, you often won't have much information |
| 156 | about the actual problem. There are two filters in net-internals that can help |
| 157 | in a lot of cases: |
| 158 | |
bnc | 16cf6923 | 2016-12-12 16:28:10 | [diff] [blame] | 159 | * "type:URL_REQUEST is:error" will restrict the source list to URL_REQUEST |
| 160 | objects with an error of some sort. Cache errors are often non-fatal, so you |
| 161 | should generally ignore those, and look for a more interesting one. |
mmenke | 429beb6c | 2015-12-15 03:11:18 | [diff] [blame] | 162 | |
mmenke | 212fe43 | 2016-03-10 16:51:33 | [diff] [blame] | 163 | * "type:URL_REQUEST sort:duration" will show the longest-lived requests first. |
| 164 | This is often useful in finding hung or slow requests. |
mmenke | 429beb6c | 2015-12-15 03:11:18 | [diff] [blame] | 165 | |
| 166 | For a list of other filter commands, you can mouse over the question mark on |
Matt Menke | ae32850c | 2018-03-20 16:40:55 | [diff] [blame] | 167 | chrome://net-internals. |
mmenke | 429beb6c | 2015-12-15 03:11:18 | [diff] [blame] | 168 | |
| 169 | Once you locate the problematic request, the next is to figure out where the |
bnc | 16cf6923 | 2016-12-12 16:28:10 | [diff] [blame] | 170 | problem is -- it's often one of the last events, though it could also be related |
| 171 | to response or request headers. You can use `source_dependency` links to |
| 172 | navigate between related sources. You can use the name of an event to search |
| 173 | for the code responsible for that event, and try to deduce what went wrong |
| 174 | before/after a particular event. |
mmenke | 429beb6c | 2015-12-15 03:11:18 | [diff] [blame] | 175 | |
| 176 | Some things to look for while debugging: |
| 177 | |
| 178 | * CANCELLED events almost always come from outside the network stack. |
| 179 | |
| 180 | * Changing networks and entering / exiting suspend mode can have all sorts of |
| 181 | fun and exciting effects on underway network activity. Network changes log a |
bnc | 16cf6923 | 2016-12-12 16:28:10 | [diff] [blame] | 182 | top level NETWORK_CHANGED event. Suspend events are currently not logged. |
mmenke | 429beb6c | 2015-12-15 03:11:18 | [diff] [blame] | 183 | |
David Benjamin | 9776ca2 | 2018-06-13 00:00:15 | [diff] [blame] | 184 | * URL_REQUEST_DELEGATE_\* / NETWORK_DELEGATE_\* / DELEGATE_INFO events mean a |
| 185 | URL_REQUEST is blocked on a URLRequest::Delegate or the NetworkDelegate, which |
| 186 | are implemented outside the network stack. A request will sometimes be CANCELED |
| 187 | here for reasons known only to the delegate. Or the delegate may cause a hang. |
| 188 | In general, to debug issues related to delegates, one needs to figure out which |
| 189 | method of which object is causing the problem. The object may be the a |
| 190 | NetworkDelegate, a ResourceThrottle, a ResourceHandler, the ResourceLoader |
| 191 | itself, or the ResourceDispatcherHost. |
mmenke | 429beb6c | 2015-12-15 03:11:18 | [diff] [blame] | 192 | |
| 193 | * Sockets are often reused between requests. If a request is on a stale |
| 194 | (reused) socket, what was the previous request that used the socket, how long |
bnc | 16cf6923 | 2016-12-12 16:28:10 | [diff] [blame] | 195 | ago was it made? (Look at SOCKET_IN_USE events, and the HTTP_STREAM_JOBS they |
| 196 | point to via the `source_dependency` value.) |
mmenke | 429beb6c | 2015-12-15 03:11:18 | [diff] [blame] | 197 | |
| 198 | * SSL negotation is a process fraught with peril, particularly with broken |
| 199 | proxies. These will generally stall or fail in the SSL_CONNECT phase at the |
| 200 | SOCKET layer. |
| 201 | |
| 202 | * Range requests have magic to handle them at the cache layer, and are often |
| 203 | issued by the media and PDF code. |
| 204 | |
| 205 | * Late binding: HTTP_STREAM_JOBs are not associated with any CONNECT_JOB until |
bnc | 16cf6923 | 2016-12-12 16:28:10 | [diff] [blame] | 206 | a CONNECT_JOB actually connects. This is so the highest priority pending |
| 207 | HTTP_STREAM_JOB gets the first available socket (which may be a new socket, or |
| 208 | an old one that's freed up). For this reason, it can be a little tricky to |
| 209 | relate hung HTTP_STREAM_JOBs to CONNECT_JOBs. |
mmenke | 429beb6c | 2015-12-15 03:11:18 | [diff] [blame] | 210 | |
| 211 | * Each CONNECT_JOB belongs to a "group", which has a limit of 6 connections. If |
bnc | 16cf6923 | 2016-12-12 16:28:10 | [diff] [blame] | 212 | all CONNECT_JOBs belonging to a group (the CONNECT_JOB's description field) are |
mmenke | 429beb6c | 2015-12-15 03:11:18 | [diff] [blame] | 213 | stalled waiting on an available socket, the group probably has 6 sockets that |
bnc | 16cf6923 | 2016-12-12 16:28:10 | [diff] [blame] | 214 | that are hung -- either hung trying to connect, or used by stalled requests and |
mmenke | 429beb6c | 2015-12-15 03:11:18 | [diff] [blame] | 215 | thus outside the socket pool's control. |
| 216 | |
| 217 | * There's a limit on number of DNS resolutions that can be started at once. If |
| 218 | everything is stalled while resolving DNS addresses, you've probably hit this |
| 219 | limit, and the DNS lookups are also misbehaving in some fashion. |
| 220 | |
| 221 | # Miscellany |
| 222 | |
| 223 | These are just miscellaneous things you may notice when looking through the |
| 224 | logs. |
| 225 | |
| 226 | * URLRequests that look to start twice for no obvious reason. These are |
| 227 | typically main frame requests, and the first request is AppCache. Can just |
| 228 | ignore it and move on with your life. |
| 229 | |
| 230 | * Some HTTP requests are not handled by URLRequestHttpJobs. These include |
| 231 | things like HSTS redirects (URLRequestRedirectJob), AppCache, ServiceWorker, |
| 232 | etc. These generally don't log as much information, so it can be tricky to |
| 233 | figure out what's going on with these. |
| 234 | |
| 235 | * Non-HTTP requests also appear in the log, and also generally don't log much |
| 236 | (blob URLs, chrome URLs, etc). |
| 237 | |
| 238 | * Preconnects create a "HTTP_STREAM_JOB" event that may create multiple |
| 239 | CONNECT_JOBs (or none) and is then destroyed. These can be identified by the |
| 240 | "SOCKET_POOL_CONNECTING_N_SOCKETS" events. |