Improve net-internals crash course. Review-Url: https://codereview.chromium.org/2501113002 Cr-Commit-Position: refs/heads/master@{#437884}

commit: 16cf6923451b77fd75f669dcf5632d143e5d98b1 [log] [tgz]
author: bnc <[email protected]> Mon Dec 12 16:28:10 2016
committer: Commit bot <[email protected]> Mon Dec 12 16:31:05 2016
tree: 5c3a3764fee07cdc57d9dc4799835706d3bc9557
parent: d4be7eeb348b6aa589b476033c0baeaba56e187f [diff] [blame]
diff --git a/net/docs/crash-course-in-net-internals.md b/net/docs/crash-course-in-net-internals.md
index 008a5755..9187f85 100644
--- a/net/docs/crash-course-in-net-internals.md
+++ b/net/docs/crash-course-in-net-internals.md

@@ -14,77 +14,138 @@
 For this reason, it lacks knowledge of tabs, navigation, frames, resource types,
 etc.
 
-The top level network stack object is the URLRequestContext.  The Events View
+The leftmost column presents a list of views.  Most debugging is done with the
+Events view, which will be all this document covers.
+
+The top level network stack object is the URLRequestContext.  The Events view
 has information for all Chrome URLRequestContexts that are hooked up to the
-single, global, ChromeNetLog object.  This includes both incognito and non-
-incognito profiles, among other things.  The Events view only shows events for
-the period that net-internals was open and running, and is incrementally updated
-as events occur.  The code attempts to add a top level event for URLRequests
-that were active when the tab was opened, to help debug hung requests, but
-that's best-effort only, and only includes requests for the current profile and
-the system URLRequestContext.
+single, global, ChromeNetLog object.  This includes both incognito and
+non-incognito profiles, among other things.  The Events view only shows events
+for the period that net-internals was open and running, and is incrementally
+updated as events occur.  The code attempts to add a top level event for
+URLRequests that were active when the about:net-internals tab was opened, to
+help debug hung requests, but that's best-effort only, and only includes
+requests for the current profile and the system URLRequestContext.
 
 The other views are all snapshots of the current state of the main
 URLRequestContext's components, and are updated on a 5 second timer.  These will
-show objects that were created before about:net-internals was opened.  Most
-debugging is done with the Events view (which will be all this document
-covers), but it's good to be aware of this distinction.
+show objects that were created before about:net-internals was opened.
 
 # Events vs Sources
 
-The Event View shows events logged by the NetLog.  The NetLog model is that
+The Events view shows events logged by the NetLog.  The NetLog model is that
 long-lived network stack objects, called sources, emit events over their
-lifetime.  When looking at the code, a "NetLogWithSource" object contains a source
-ID, and a pointer to the NetLog the source emits events to.  Some events have a
-beginning and end point (during which other subevents may occur), and some only
-occur at a single point in time.  Generally only one event can be occuring for a
-source at a time.  If there can be multiple events doing completely independent
-thing, the code often uses new sources to represent the parallelism.
+lifetime.  A NetLogWithSource object contains a source ID, a NetLogSourceType,
+and a pointer to the NetLog the source emits events to.
+
+The Events view has a list of sources in a column adjacent to the list of views.
+Sources that include an event with a net_error parameter with negative value
+(that is, some kind of ERR_) are shown with red background.  Sources whose
+opening event has not ended yet are shown with white background.  Other events
+have green background.  The search queries corresponding to the first two kinds
+are `is:error` and `is:active`.
+
+When one or more sources are selected, corresponding events show up in another
+column to the right, sorted by source, and by time within each source.  There
+are two time values: t is measured from some reference point common to all
+sources, and st is measured from the first event for each source.  Time is
+displayed in milliseconds.
+
+Since the network stack is asynchronous, events from different sources will
+often be interlaced in time, but Events view does not feature showing events from
+different sources ordered by time.  Large time gaps in the event list of a
+single source usually mean that time is spent in the context of another source.
+
+Some events come in pairs: a beginning and end event, between which other events
+may occur.  They are shown with + and - prefixes, respectively.  The begin event
+has a dt value which shows the duration.  If the end event was captured, then
+duration is calculated as the time difference between the begin and the end
+events.  Otherwise the time elapsed from the begin event until capturing
+was stopped is displayed (a lower bound for actual duration), followed by a +
+sign (for example, "dt=120+").
+
+If there are no other events in between the begin and end, and the end event has
+no parameters, then they are collapsed in a single line without a sign prefix.
+
+Some other events only occur at a single point in time, and will not have either
+a sign prefix, or a dt duration value.
+
+Generally only one event can be occuring for a source at a time.  If there can
+be multiple events doing completely independent things, the code often uses new
+sources to represent the parallelism.
+
+Most, but not all events correspond to a source.  Exceptions are global events,
+which have no source, and show up as individual entries in the source list.
+Examples of global events include NETWORK_CHANGED, DNS_CONFIG_CHANGED, and
+PROXY_CONFIG_CHANGED.
+
+# Common source types
 
 "Sources" correspond to certain net objects, however, multiple layers of net/
 will often log to a single source.  Here are the main source types and what they
-include (Excluding HTTP2 [SPDY]/QUIC):
+include (excluding HTTP2 [SPDY]/QUIC):
 
 * URL_REQUEST:  This corresponds to the URLRequest object.  It includes events
 from all the URLRequestJobs, HttpCache::Transactions, NetworkTransactions,
 HttpStreamFactoryImpl::Requests, HttpStream implementations, and
 HttpStreamParsers used to service a response.  If the URL_REQUEST follows HTTP
 redirects, it will include each redirect.  This is a lot of stuff, but generally
-only object is doing work at a time.  This event source includes the full URL
-and generally includes the request / response headers (Except when the cache
+only one object is doing work at a time.  This event source includes the full
+URL and generally includes the request / response headers (except when the cache
 handles the response).
 
-* HTTP_STREAM_JOB:  This corresponds to HttpStreamFactoryImpl::Job (Note that
+* HTTP_STREAM_JOB:  This corresponds to HttpStreamFactoryImpl::Job (note that
 one Request can have multiple Jobs).  It also includes its proxy and DNS
-lookups.  HTTP_STREAM_JOB log events are separate from URL_REQUEST because
-two stream jobs may be created and races against each other, in some cases -
-one for one for QUIC, and one for HTTP.  One of the final events of this source
-indicates how an HttpStream was created (Reusing an existing SOCKET /
-HTTP2_SESSION / QUIC_SESSION, or creating a new one).
+lookups.  HTTP_STREAM_JOB log events are separate from URL_REQUEST because two
+stream jobs may be created and races against each other, in some cases -- one
+for QUIC, and one for HTTP.
+
+    One of the final events of this source, before the
+    HTTP_STREAM_JOB_BOUND_TO_REQUEST event, indicates how an HttpStream was
+    created:
+
+    + A SOCKET_POOL_BOUND_TO_CONNECT_JOB event means that a new TCP socket was
+    created, whereas a SOCKET_POOL_REUSED_AN_EXISTING_SOCKET event indicates that
+    an existing TCP socket was reused for a non-HTTP/2 request.
+
+    + An HTTP2_SESSION_POOL_IMPORTED_SESSION_FROM_SOCKET event indicates that a
+    new HTTP/2 session was opened by this Job.
+
+    + An HTTP2_SESSION_POOL_FOUND_EXISTING_SESSION event indicates that the request
+    was served on a preexisting HTTP/2 session.
+
+    + An HTTP2_SESSION_POOL_FOUND_EXISTING_SESSION_FROM_IP_POOL event means that
+    the request was pooled to a preexisting HTTP/2 session which had a different
+    SpdySessionKey, but DNS resolution resulted in the same IP, and the
+    certificate matches.
+
+    + There are currently no events logged for opening new QUIC sessions or
+    reusing existing ones.
 
 * \*_CONNECT_JOB:  This corresponds to the ConnectJob subclasses that each socket
-pool uses.  A successful CONNECT_JOB return a SOCKET.  The events here vary a
+pool uses.  A successful CONNECT_JOB returns a SOCKET.  The events here vary a
 lot by job type.  Their main event is generally either to create a socket, or
-request a socket from another socket pool (Which creates another CONNECT_JOB)
-and then do some extra work on top of that - like establish an SSL connection on
+request a socket from another socket pool (which creates another CONNECT_JOB)
+and then do some extra work on top of that -- like establish an SSL connection on
 top of a TCP connection.
 
 * SOCKET:  These correspond to TCPSockets, but may also have other classes
-layered on top of them (Like an SSLClientSocket).  This is a bit different from
+layered on top of them (like an SSLClientSocket).  This is a bit different from
 the other classes, where the name corresponds to the topmost class, instead of
 the bottommost one.  This is largely an artifact of the fact the socket is
-created first, and then SSL (Or a proxy connection) is layered on top of it.
+created first, and then SSL (or a proxy connection) is layered on top of it.
 SOCKETs may be reused between multiple requests, and a request may end up
 getting a socket created for another request.
 
-* HOST_RESOLVER_IMPL_JOB:  These correspond to HostResolverImpl::Job.  The
+* HOST_RESOLVER_IMPL_JOB:  These correspond to HostResolverImpl::Job.  They
 include information about how long the lookup was queued, each DNS request that
-was attempted (With the platform or built-in resolver) and all the other sources
+was attempted (with the platform or built-in resolver) and all the other sources
 that are waiting on the job.
 
-When one source depends on another, the code generally logs an event with
-"source_dependency" value to both sources, which lets you jump between the two
-related events.
+When one source depends on another, the code generally logs an event at both
+sources with a `source_dependency` value pointing to the other source.  These
+are clickable in the UI, adding the referred source to the list of selected
+sources.
 
 # Debugging
 
@@ -96,9 +157,9 @@
 about the actual problem.  There are two filters in net-internals that can help
 in a lot of cases:
 
-* "type:URL_REQUEST is:error" will restrict the list to URL_REQUEST object with
-an error of some sort (red background).  Cache errors are often non-fatal, so
-you should generally ignore those, and look for a more interesting one.
+* "type:URL_REQUEST is:error" will restrict the source list to URL_REQUEST
+objects with an error of some sort.  Cache errors are often non-fatal, so you
+should generally ignore those, and look for a more interesting one.
 
 * "type:URL_REQUEST sort:duration" will show the longest-lived requests first.
 This is often useful in finding hung or slow requests.
@@ -107,14 +168,11 @@
 about:net-internals.
 
 Once you locate the problematic request, the next is to figure out where the
-problem is - it's often one of the last events, though it could also be related
-to response or request headers.  You can use "source_dependency" links to drill
-down into other related sources, or up from layers below URL_REQUEST.
-
-You can use the name of an event to search for the code responsible for that
-event, and try to deduce what went wrong before/after a particular event.  Note
-that the event names used in net-internals are not the entire string names, so
-you should not do an entire string match.
+problem is -- it's often one of the last events, though it could also be related
+to response or request headers.  You can use `source_dependency` links to
+navigate between related sources.  You can use the name of an event to search
+for the code responsible for that event, and try to deduce what went wrong
+before/after a particular event.
 
 Some things to look for while debugging:
 
@@ -122,8 +180,7 @@
 
 * Changing networks and entering / exiting suspend mode can have all sorts of
 fun and exciting effects on underway network activity.  Network changes log a
-top level NETWORK_CHANGED event with no source - the event itself is treated as
-its own source.  Suspend events are currently not logged.
+top level NETWORK_CHANGED event.  Suspend events are currently not logged.
 
 * URL_REQUEST_DELEGATE / DELEGATE_INFO events mean a URL_REQUEST is blocked on a
 URLRequest::Delegate or the NetworkDelegate, which are implemented outside the
@@ -136,7 +193,8 @@
 
 * Sockets are often reused between requests.  If a request is on a stale
 (reused) socket, what was the previous request that used the socket, how long
-ago was it made?
+ago was it made?  (Look at SOCKET_IN_USE events, and the HTTP_STREAM_JOBS they
+point to via the `source_dependency` value.)
 
 * SSL negotation is a process fraught with peril, particularly with broken
 proxies.  These will generally stall or fail in the SSL_CONNECT phase at the
@@ -146,15 +204,15 @@
 issued by the media and PDF code.
 
 * Late binding:  HTTP_STREAM_JOBs are not associated with any CONNECT_JOB until
-a CONNECT_JOB actually connects.  This is so the highest priority pending job
-gets the first available socket (Which may be a new socket, or an old one that's
-freed up).  For this reason, it can be a little tricky to relate hung
-HTTP_STREAM_JOBs to CONNECT_JOBs.
+a CONNECT_JOB actually connects.  This is so the highest priority pending
+HTTP_STREAM_JOB gets the first available socket (which may be a new socket, or
+an old one that's freed up).  For this reason, it can be a little tricky to
+relate hung HTTP_STREAM_JOBs to CONNECT_JOBs.
 
 * Each CONNECT_JOB belongs to a "group", which has a limit of 6 connections.  If
-all CONNECT_JOBs beling to a group (The CONNECT_JOB's description field) are
+all CONNECT_JOBs belonging to a group (the CONNECT_JOB's description field) are
 stalled waiting on an available socket, the group probably has 6 sockets that
-that are hung - either hung trying to connect, or used by stalled requests and
+that are hung -- either hung trying to connect, or used by stalled requests and
 thus outside the socket pool's control.
 
 * There's a limit on number of DNS resolutions that can be started at once.  If
commit	16cf6923451b77fd75f669dcf5632d143e5d98b1	[log] [tgz]
author	bnc <[email protected]>	Mon Dec 12 16:28:10 2016
committer	Commit bot <[email protected]>	Mon Dec 12 16:31:05 2016
tree	5c3a3764fee07cdc57d9dc4799835706d3bc9557
parent	d4be7eeb348b6aa589b476033c0baeaba56e187f [diff] [blame]