blob: 1a120fb03813535b63ec7fbc977f124dcaae3b5c [file] [log] [blame] [view]
Matt Menkeae32850c2018-03-20 16:40:551# A Crash Course in Debugging with chrome://net-internals
mmenke429beb6c2015-12-15 03:11:182
3This document is intended to help get people started debugging network errors
Matt Menkeae32850c2018-03-20 16:40:554with chrome://net-internals, with some commonly useful tips and tricks. This
mmenke429beb6c2015-12-15 03:11:185document is aimed more at how to get started using some of its features to
6investigate bug reports, rather than as a feature overview.
7
mmenke212fe432016-03-10 16:51:338It would probably be useful to read
9[life-of-a-url-request.md](life-of-a-url-request.md) before this document.
mmenke429beb6c2015-12-15 03:11:1810
11# What Data Net-Internals Contains
12
Matt Menkeae32850c2018-03-20 16:40:5513chrome://net-internals provides a view of browser activity from net/'s
14perspective. For this reason, it lacks knowledge of tabs, navigation, frames,
15resource types, etc.
mmenke429beb6c2015-12-15 03:11:1816
bnc16cf69232016-12-12 16:28:1017The leftmost column presents a list of views. Most debugging is done with the
18Events view, which will be all this document covers.
19
20The top level network stack object is the URLRequestContext. The Events view
mmenke429beb6c2015-12-15 03:11:1821has information for all Chrome URLRequestContexts that are hooked up to the
Eric Romanb67de83b2019-06-22 00:12:3222single, global, NetLog object. This includes both incognito and
bnc16cf69232016-12-12 16:28:1023non-incognito profiles, among other things. The Events view only shows events
24for the period that net-internals was open and running, and is incrementally
25updated as events occur. The code attempts to add a top level event for
Matt Menkeae32850c2018-03-20 16:40:5526URLRequests that were active when the chrome://net-internals tab was opened, to
bnc16cf69232016-12-12 16:28:1027help debug hung requests, but that's best-effort only, and only includes
28requests for the current profile and the system URLRequestContext.
mmenke429beb6c2015-12-15 03:11:1829
30The other views are all snapshots of the current state of the main
31URLRequestContext's components, and are updated on a 5 second timer. These will
Matt Menkeae32850c2018-03-20 16:40:5532show objects that were created before chrome://net-internals was opened.
mmenke429beb6c2015-12-15 03:11:1833
34# Events vs Sources
35
bnc16cf69232016-12-12 16:28:1036The Events view shows events logged by the NetLog. The NetLog model is that
mmenke429beb6c2015-12-15 03:11:1837long-lived network stack objects, called sources, emit events over their
bnc16cf69232016-12-12 16:28:1038lifetime. A NetLogWithSource object contains a source ID, a NetLogSourceType,
39and a pointer to the NetLog the source emits events to.
40
41The Events view has a list of sources in a column adjacent to the list of views.
42Sources that include an event with a net_error parameter with negative value
43(that is, some kind of ERR_) are shown with red background. Sources whose
44opening event has not ended yet are shown with white background. Other events
45have green background. The search queries corresponding to the first two kinds
46are `is:error` and `is:active`.
47
48When one or more sources are selected, corresponding events show up in another
49column to the right, sorted by source, and by time within each source. There
50are two time values: t is measured from some reference point common to all
51sources, and st is measured from the first event for each source. Time is
52displayed in milliseconds.
53
54Since the network stack is asynchronous, events from different sources will
55often be interlaced in time, but Events view does not feature showing events from
56different sources ordered by time. Large time gaps in the event list of a
57single source usually mean that time is spent in the context of another source.
58
59Some events come in pairs: a beginning and end event, between which other events
60may occur. They are shown with + and - prefixes, respectively. The begin event
61has a dt value which shows the duration. If the end event was captured, then
62duration is calculated as the time difference between the begin and the end
63events. Otherwise the time elapsed from the begin event until capturing
64was stopped is displayed (a lower bound for actual duration), followed by a +
65sign (for example, "dt=120+").
66
67If there are no other events in between the begin and end, and the end event has
68no parameters, then they are collapsed in a single line without a sign prefix.
69
70Some other events only occur at a single point in time, and will not have either
71a sign prefix, or a dt duration value.
72
73Generally only one event can be occuring for a source at a time. If there can
74be multiple events doing completely independent things, the code often uses new
75sources to represent the parallelism.
76
77Most, but not all events correspond to a source. Exceptions are global events,
78which have no source, and show up as individual entries in the source list.
79Examples of global events include NETWORK_CHANGED, DNS_CONFIG_CHANGED, and
80PROXY_CONFIG_CHANGED.
81
82# Common source types
mmenke429beb6c2015-12-15 03:11:1883
84"Sources" correspond to certain net objects, however, multiple layers of net/
85will often log to a single source. Here are the main source types and what they
bnc16cf69232016-12-12 16:28:1086include (excluding HTTP2 [SPDY]/QUIC):
mmenke429beb6c2015-12-15 03:11:1887
88* URL_REQUEST: This corresponds to the URLRequest object. It includes events
89from all the URLRequestJobs, HttpCache::Transactions, NetworkTransactions,
David Benjamin4aa6b2a2019-01-09 16:38:0090HttpStreamRequests, HttpStream implementations, and HttpStreamParsers used to
91service a response. If the URL_REQUEST follows HTTP redirects, it will include
92each redirect. This is a lot of stuff, but generally only one object is doing
93work at a time. This event source includes the full URL and generally includes
94the request / response headers (except when the cache handles the response).
mmenke429beb6c2015-12-15 03:11:1895
David Benjamin4aa6b2a2019-01-09 16:38:0096* HTTP_STREAM_JOB: This corresponds to HttpStreamFactory::Job (note that one
97 Request can have multiple Jobs). It also includes its proxy and DNS lookups.
98 HTTP_STREAM_JOB log events are separate from URL_REQUEST because two stream
99 jobs may be created and races against each other, in some cases -- one for
100 QUIC, and one for HTTP.
bnc16cf69232016-12-12 16:28:10101
102 One of the final events of this source, before the
103 HTTP_STREAM_JOB_BOUND_TO_REQUEST event, indicates how an HttpStream was
104 created:
105
106 + A SOCKET_POOL_BOUND_TO_CONNECT_JOB event means that a new TCP socket was
107 created, whereas a SOCKET_POOL_REUSED_AN_EXISTING_SOCKET event indicates that
108 an existing TCP socket was reused for a non-HTTP/2 request.
109
110 + An HTTP2_SESSION_POOL_IMPORTED_SESSION_FROM_SOCKET event indicates that a
111 new HTTP/2 session was opened by this Job.
112
113 + An HTTP2_SESSION_POOL_FOUND_EXISTING_SESSION event indicates that the request
114 was served on a preexisting HTTP/2 session.
115
116 + An HTTP2_SESSION_POOL_FOUND_EXISTING_SESSION_FROM_IP_POOL event means that
117 the request was pooled to a preexisting HTTP/2 session which had a different
118 SpdySessionKey, but DNS resolution resulted in the same IP, and the
119 certificate matches.
120
121 + There are currently no events logged for opening new QUIC sessions or
122 reusing existing ones.
mmenke429beb6c2015-12-15 03:11:18123
davidbenb7048f092016-11-30 21:20:26124* \*_CONNECT_JOB: This corresponds to the ConnectJob subclasses that each socket
bnc16cf69232016-12-12 16:28:10125pool uses. A successful CONNECT_JOB returns a SOCKET. The events here vary a
mmenke429beb6c2015-12-15 03:11:18126lot by job type. Their main event is generally either to create a socket, or
bnc16cf69232016-12-12 16:28:10127request a socket from another socket pool (which creates another CONNECT_JOB)
128and then do some extra work on top of that -- like establish an SSL connection on
mmenke429beb6c2015-12-15 03:11:18129top of a TCP connection.
130
131* SOCKET: These correspond to TCPSockets, but may also have other classes
bnc16cf69232016-12-12 16:28:10132layered on top of them (like an SSLClientSocket). This is a bit different from
mmenke429beb6c2015-12-15 03:11:18133the other classes, where the name corresponds to the topmost class, instead of
134the bottommost one. This is largely an artifact of the fact the socket is
bnc16cf69232016-12-12 16:28:10135created first, and then SSL (or a proxy connection) is layered on top of it.
mmenke429beb6c2015-12-15 03:11:18136SOCKETs may be reused between multiple requests, and a request may end up
137getting a socket created for another request.
138
bnc16cf69232016-12-12 16:28:10139* HOST_RESOLVER_IMPL_JOB: These correspond to HostResolverImpl::Job. They
mmenke429beb6c2015-12-15 03:11:18140include information about how long the lookup was queued, each DNS request that
bnc16cf69232016-12-12 16:28:10141was attempted (with the platform or built-in resolver) and all the other sources
mmenke429beb6c2015-12-15 03:11:18142that are waiting on the job.
143
bnc16cf69232016-12-12 16:28:10144When one source depends on another, the code generally logs an event at both
145sources with a `source_dependency` value pointing to the other source. These
146are clickable in the UI, adding the referred source to the list of selected
147sources.
mmenke429beb6c2015-12-15 03:11:18148
149# Debugging
150
151When you receive a report from the user, the first thing you'll generally want
152to do find the URL_REQUEST[s] that are misbehaving. If the user gives an ERR_*
153code or the exact URL of the resource that won't load, you can just search for
154it. If it's an upload, you can search for "post", or if it's a redirect issue,
155you can search for "redirect". However, you often won't have much information
156about the actual problem. There are two filters in net-internals that can help
157in a lot of cases:
158
bnc16cf69232016-12-12 16:28:10159* "type:URL_REQUEST is:error" will restrict the source list to URL_REQUEST
160objects with an error of some sort. Cache errors are often non-fatal, so you
161should generally ignore those, and look for a more interesting one.
mmenke429beb6c2015-12-15 03:11:18162
mmenke212fe432016-03-10 16:51:33163* "type:URL_REQUEST sort:duration" will show the longest-lived requests first.
164This is often useful in finding hung or slow requests.
mmenke429beb6c2015-12-15 03:11:18165
166For a list of other filter commands, you can mouse over the question mark on
Matt Menkeae32850c2018-03-20 16:40:55167chrome://net-internals.
mmenke429beb6c2015-12-15 03:11:18168
169Once you locate the problematic request, the next is to figure out where the
bnc16cf69232016-12-12 16:28:10170problem is -- it's often one of the last events, though it could also be related
171to response or request headers. You can use `source_dependency` links to
172navigate between related sources. You can use the name of an event to search
173for the code responsible for that event, and try to deduce what went wrong
174before/after a particular event.
mmenke429beb6c2015-12-15 03:11:18175
176Some things to look for while debugging:
177
178* CANCELLED events almost always come from outside the network stack.
179
180* Changing networks and entering / exiting suspend mode can have all sorts of
181fun and exciting effects on underway network activity. Network changes log a
bnc16cf69232016-12-12 16:28:10182top level NETWORK_CHANGED event. Suspend events are currently not logged.
mmenke429beb6c2015-12-15 03:11:18183
David Benjamin9776ca22018-06-13 00:00:15184* URL_REQUEST_DELEGATE_\* / NETWORK_DELEGATE_\* / DELEGATE_INFO events mean a
185URL_REQUEST is blocked on a URLRequest::Delegate or the NetworkDelegate, which
186are implemented outside the network stack. A request will sometimes be CANCELED
187here for reasons known only to the delegate. Or the delegate may cause a hang.
188In general, to debug issues related to delegates, one needs to figure out which
189method of which object is causing the problem. The object may be the a
190NetworkDelegate, a ResourceThrottle, a ResourceHandler, the ResourceLoader
191itself, or the ResourceDispatcherHost.
mmenke429beb6c2015-12-15 03:11:18192
193* Sockets are often reused between requests. If a request is on a stale
194(reused) socket, what was the previous request that used the socket, how long
bnc16cf69232016-12-12 16:28:10195ago was it made? (Look at SOCKET_IN_USE events, and the HTTP_STREAM_JOBS they
196point to via the `source_dependency` value.)
mmenke429beb6c2015-12-15 03:11:18197
198* SSL negotation is a process fraught with peril, particularly with broken
199proxies. These will generally stall or fail in the SSL_CONNECT phase at the
200SOCKET layer.
201
202* Range requests have magic to handle them at the cache layer, and are often
203issued by the media and PDF code.
204
205* Late binding: HTTP_STREAM_JOBs are not associated with any CONNECT_JOB until
bnc16cf69232016-12-12 16:28:10206a CONNECT_JOB actually connects. This is so the highest priority pending
207HTTP_STREAM_JOB gets the first available socket (which may be a new socket, or
208an old one that's freed up). For this reason, it can be a little tricky to
209relate hung HTTP_STREAM_JOBs to CONNECT_JOBs.
mmenke429beb6c2015-12-15 03:11:18210
211* Each CONNECT_JOB belongs to a "group", which has a limit of 6 connections. If
bnc16cf69232016-12-12 16:28:10212all CONNECT_JOBs belonging to a group (the CONNECT_JOB's description field) are
mmenke429beb6c2015-12-15 03:11:18213stalled waiting on an available socket, the group probably has 6 sockets that
bnc16cf69232016-12-12 16:28:10214that are hung -- either hung trying to connect, or used by stalled requests and
mmenke429beb6c2015-12-15 03:11:18215thus outside the socket pool's control.
216
217* There's a limit on number of DNS resolutions that can be started at once. If
218everything is stalled while resolving DNS addresses, you've probably hit this
219limit, and the DNS lookups are also misbehaving in some fashion.
220
221# Miscellany
222
223These are just miscellaneous things you may notice when looking through the
224logs.
225
226* URLRequests that look to start twice for no obvious reason. These are
227typically main frame requests, and the first request is AppCache. Can just
228ignore it and move on with your life.
229
230* Some HTTP requests are not handled by URLRequestHttpJobs. These include
231things like HSTS redirects (URLRequestRedirectJob), AppCache, ServiceWorker,
232etc. These generally don't log as much information, so it can be tricky to
233figure out what's going on with these.
234
235* Non-HTTP requests also appear in the log, and also generally don't log much
236(blob URLs, chrome URLs, etc).
237
238* Preconnects create a "HTTP_STREAM_JOB" event that may create multiple
239CONNECT_JOBs (or none) and is then destroyed. These can be identified by the
240"SOCKET_POOL_CONNECTING_N_SOCKETS" events.