[philiptellis] /bb|[^b]{2}/
Never stop Grokking


Showing posts with label performance. Show all posts
Showing posts with label performance. Show all posts

Saturday, February 01, 2025

When users interact

When looking at the Core Web Vitals, we often try optimizing each independently of the others, but that's not how users experience the web. A user's web experience is made up of many metrics, and it's important to look at these metrics together for each experience. Real User Measurement (RUM) allows us to do that by collecting operational metrics in conjunction with user actions, the combination of which can tell us whether our pages actually meet the user's expectations.

In this experiment, I decided to look at each of the events in a page's loading cycle, and break that down by when the user tried interacting with the page. For those interactions, I looked at the Interaction to Next Paint, and the rate of rage clicking to get an idea of user experience and whether that experience may have been frustrating or not.

Before I jump into the charts, I should note an important caveat about the data. This analysis was done using RUM data collected by Akamai's mPulse product which collects data at or soon after page load. Not all page views resulted in an interaction before data was collected. Most of the analysis was restricted to page views where we had at least one interaction prior to data collection. We see on average, between 2-25% of beacons collected (across sites) had an interaction. Most sites had a recorded interaction on about 10% of beacons. I also separately looked at data collected during page unload/pagehide and while it captured more interactions, it did not have a noticeable effect on the results.

Each of the following charts is from a different website in mPulse's dataset.

Exploring the chart

Interaction Analysis - Virtual Globe Trotting
Interaction analysis chart for Virtual Globe Trotting

Let's now look at the various features of this chart.

The chart shows multiple dimensions of data projected onto a 2D surface, so some parts of it will appear wonky. We'll walk through that in this section.

Event labels

The first thing we'll describe are the events. These are the vertical colored lines with labels to their right. These represent transition events in the page load cycle. The events we include are:

You may have already noticed that in this particular chart, First Paint is _after_ First Contentful Paint, which is counter-intuitive. The reason we see this is that the number of data points with First Paint on them is different from those with First Contentful Paint. Safari and Firefox, for example, support FCP but not FP. When aggregating these points, the same percentile value when applied to two data sets will likely get you values from two different experiences. This effect is more prominent when the sample sizes are different. In general we would not expect the delta to be too far off, and in the data I've looked at, it hasn't been more than 50ms off.

The events to keep an eye on are the Largest Contentful Paint or Time to Visually Ready, the Time to Interactive, and the delta between them. LCP is not currently supported on Safari, so we use boomerang's cross-browser calculation of TTVR in those cases.

Time to Interactive is considered a lab measurement, but `boomerang` measures it in a cross-browser manner during RUM sessions, and passes that data back to mPulse. It is approximately the time when interactions with the page are expected to be smooth due to no more long animation frames and blocking time.

The next thing to note are that these events are positioned on this projection based on when they occurred relative to interactions _as well as_ when they occurred relative to page load time. By definition this means that all interactions should show up after LCP but it may show up differently on the chart due to the projection from multiple dimensions down to two. There's also the fact that TTVR calculations do not stop at first interaction, so on browsers that do not support LCP, we may see interactions before the proxy for that event.

The absolute value of each event is calculated across the entire dataset, even on pages without intereactions, so it might look like events aren't placed where their values dictate they should be, however the percentage of users interacting before & after an event is always correct.

The last label to take note of is the fraction of users that interacted before `boomerang` considered the page to be interactive. In this case, it's 12% of users.

Data distributions
Interaction Analysis Site. 2
Interaction analysis chart showing mouseover details.

There are a few different distributions shown on this chart, (and even more when we look at the mouseover in the chart above).

The blue area chart is the _population density_. It shows, for every 5% interval of the page load time, how many users first interacted with the page at that point in the page's loading cycle.

The blue dots that trace the population density chart show the median _Interaction to Next Paint_ value for all of those interactions. Keep in mind that INP is not supported on Safari, whereas `boomerang`'s own measurements for TTI do work across browsers.

The vertical position of the red dots shows the _probability_ that interactions at that time resulted in _rage clicks_ while the size of the red dots shows the _intensity_ of these rage clicks. Rage clicks are collected across browsers.

The thin orange line shows Frustration Index for users that interacted within that window.

We also have the median Total Blocking Time for each of these interactions, though that's only visible in the live versions of these charts and not in most of the screenshots posted here.

In this second chart, we see that 59% of users interacted with the site before it became interactive. Its TTI is further from the LCP time compared to the first site.

Insights from the data

Interaction Analysis Fig. 3
Interaction analysis chart showing INP increasing around TTI.

When we look at this data across websites, we see the same patterns. Users expect to be able to interact with the site once the page is largely visible, however, the user experience for interactions is sub-optimal until the time to interactive which can be much later in the page's loading cycle.

In most cases we see a high Total Blocking Time in the period between LCP and TTI, resulting in a slow INP, and higher probability of rage clicking.

When looking to optimize a site for user experience, we shouldn't look at each metric in isolation. A really fast LCP is a great first user experience, but it's also a signal to the user that they can proceed with interacting to complete their task. It's important that the rest of the page be ready for those interactions and keep up the good experience.

The elephant in the room

Interaction Analysis Fig. 4
Interaction analysis chart for Akamai.com focussing on the population series.

As an aside, has anyone else noticed that these charts almost always look like a sleeping elephant (or maybe a hat)? I've seen very few sites where this isn't the case, so I looked into that pattern.

The population distribution pattern we see is a gradual curve increasing, then a dip that looks like the elephant's neck, then a bump that could be its ears, a sharp dip and long flat region that could be its trunk.

It could well be a Normal distribution if it weren't for the dip and spike right around PLT.

Normal Distribution
A basic Normal Distribution curve with a mean of 75 and standard deviation of 30.

The drop-off after OnLoad is expected. `boomerang.js` sends a beacon on or soon after page load (sites can configure a beacon delay of a few seconds to capture post-onload events). This results in a drop-off in data with interactions after onload. The post onload interactions are on pages that are faster than the average.

The strange pattern is the spike in interactions just at or after onload (it's sometimes at 100% and sometimes at 105%). The dip at 95% & 100% shows up on most, but not all sites, but the spike shows up everywhere.

I looked closer at the data around those buckets and there is very little difference in terms of experience. The page load time, LCP time, TTI time, etc. are all very similar at the 25th and 75th percentile (in other words, the experiences are comparable). The only difference is that more users prefer to interact with the site just after the onload event has fired than just before it. It's not a big delay - about 200-400ms on average across sites, but it does look like some portion of users still wait for the loading indicator to complete before they interact.

Conclusions

In conclusion, I think there's a lot to be learned from looking at when your users interact with your site. Which parts of the page have finished loading when that interaction happens? What's still in flight? What do they experience? Is there too much of a delay between your LCP and the site becoming usable?

A good loading experience needs your page to transition from state to state smoothly without too much delay between states. Looking at the loading Frustration Index can identify pages where this isn't the case.

When comparing different events on the page, look at the aggregate of deltas rather than the delta of aggregates.

And lastly, keep an eye out for that elephant.

References

Glossary on Mozilla Developer Network
Web Vitals on Google's Web.Dev
Implementations in mPulse
Other useful links

Monday, August 30, 2021

The metrics game

A recent tweet by Punit Sethi about a Wordpress plugin that reduces Largest Contentful Paint (LCP) without actually improving user experience led to a discussion about faking/gaming metrics.

Core Web Vitals

Google recently started using the LCP and other Core Web Vitals (aka CWV) as a signal for ranking search results. Google's goal in using CWV as a ranking signal is to make the web better for end users. The understanding is that these metrics (Input delays, Layout shift, and Contentful paints) reflect the end user experience, so sites with good CWV scores should (in theory) be better for users... reducing wait time, frustration, and annoyance with the web.

If I've learnt anything over the last 20 years of working with the web, it's that getting to the top of a Google search result page (SRP) is a major goal for most site owners, so metrics that affect that ranking tend to be researched a lot. The LCP is no different, and the result often shows up in such "quick fix" plugins that Punit discusses above. Web performance (Page Load Time) was only ever spoken about as a sub-topic in highly technical spaces until Google decided to start using it as a signal for page ranking, and then suddenly everyone wanted to make their sites faster.

My background in performance

I started working with web performance in the mid 2000s at Yahoo!. We had amazing Frontend Engineering experts at Yahoo!, and for the first time, engineering processes on the front-end were as strong as the back-end. In many cases we had to be far more disciplined, because Frontend Engineers do not have the luxury of their code being private and running on pre-selected hardware and software specs.

At the time, Yahoo! had a performance team of one person — Steve "Chief Performance Yahoo" Souders. He'd gotten a small piece of JavaScript to measure front-end performance onto the header of all pages by pretending it was an "Ad", and Ash Patel, who may have been an SVP at the time, started holding teams accountable for their performance.

Denial

Most sites' first reaction was to deny the results, showing scans from Keynote and Gomez, which at the time only synthetically measured load times from the perspective of well connected backbone agents, and were very far off from the numbers that roundtrip was showing.

The Wall of Shame

I wasn't working on any public facing properties, but became interested in Steve's work when he introduced the Wall of Fame/Shame (depending on which way you sorted it). It would periodically show up on the big screen at URLs (the Yahoo! cafeteria). Steve now had a team of 3 or 4, and somehow in late 2007 I managed to get myself transferred into this team.

The Wall of Shame showed a kind of stock-ticker like view where a site's current performance was compared against its performance from a week ago, and one day we saw a couple of sites (I won't mention them) jump from the worst position to the best! We quickly visited the sites and timed things with a stop-watch, but they didn't actually appear much faster. In many instances they might have even been slower. We started looking through the source and saw what was happening.

The sites had discovered AJAX!

Faking it

There was almost nothing loaded on the page before the onload event. The only content was some JavaScript that ran on onload and downloaded the framework and data for the rest of the site. Once loaded, it was a long-lived single page application with far fewer traditional page views.

Site owners argued that it would make the overall experience better, and they weren't intentionally trying to fake things. Unfortunately we had no way to actually measure this, so we added a way for them to call an API when their initial framework had completed loading. That way we'd get some data to trend over time.

At Yahoo! we had the option of speaking to every site builder and to work with them to make things better. Outside though, is a different matter.

Measuring Business Impact

Once we'd started LogNormal (and continuing with mPulse), and were serving multiple customers, it soon became clear that we'd need both business and engineering champions at each customer site. We needed to sell the business case for performance, but also make sure engineering used it for their benefit rather than gaming the metrics. We started correlating business metrics like revenue, conversions, and activity with performance. There is no cheap way to game these metrics because they depend on the behaviour of real users.

Sites that truly care about performance and the business impact of that performance, worked hard to make their sites faster.

This changed when Google started using speed as a ranking signal.

With this change, sites now had to serve two users, and when in conflict, Real Users lost out to Googlebot. After all, you can't serve real users if they can't see your site. Switching to CWV does not change the situation because things like Page Load Time, Largest Contentful Paint, and Layout Shift can all be faked or gamed by clever developers.

Ungameable Metrics

This brings us back to the metrics that we've seen couldn't be gamed. Things like time spent on a site, bounce rate, conversions, and revenue, are an indication of actual user behaviour. Users are only motivated by their ability to complete the task they set out to do, and using this as a ranking signal is probably a better idea.

Unfortunately, activity, conversions, and revenue are also fairly private corporate data. Leaking this data can affect stock prices and clue competitors in to how you're doing.

User frustration & CrUX

Now the goal of using these signals is to measure user frustration. Google Chrome periodically sends user interaction measurements back to their servers, collected as part of the Chrome User Experience report (CrUX). This includes things like the actual user experienced LCP, FID, and CLS In my opinion, it should also include measures like rage clicks, missed, and dead clicks, jank while scrolling, CPU busy-ness, battery drain, etc. Metrics that only come into play while a user is interacting with the site, and that affect or reflect how frustrating the experience may be.

It would also need to have buy-in from a few more browsers. Chrome has huge market share, but doesn't reflect the experience of all users. Data from mPulse shows that across websites, Chrome only makes up, on average, 44% of page loads. Edge and Safari (including mobile) also have a sizeable share. Heck, even IE has a 3% share on sites where it's still supported.

In the chart below, each box shows the distribution of a browser's traffic share across sites. The plot includes (in descending order of number of websites with sizeable traffic for that browser) Chrome, Edge, Mobile Safari, Chrome Mobile, Firefox, Safari, Samsung Internet, Chrome Mobile iOS, Google, IE, and Chrome Mobile WebView. Box Plot of browser share across websites.

It's unlikely that other browsers would trust Google with this raw information, so there probably needs to be an independent consortium that collects, anonymizes, and summarizes the data, and makes it available to any search provider.

Using something like the Frustration Index is another way to make it hard to fake ranking metrics without also accidentally making the user experience better.

Comparing these metrics with Googlebot's measures could hint at whether the metrics are being gamed or not, or perhaps it even lowers the weight of Googlebot's measures, restricting it only to pages that haven't received a critical mass of users.

We need to move the balance of ranking power back to the users whose experience matters!

Wednesday, November 18, 2020

Understanding Emotion for Happy Users

How does your site make your users feel?

Introduction

So you’ve come here for a post about performance, but here I am talking about emotion… what gives? I hope that if you haven’t already, then as this post progresses, you’ll see that performance and emotion are closely intertwined.

While we may be web builders, our goal is to run a business that provides services or products to real people. The website we build is a means of connecting people to that service or product.

The way things are…

The art and science of measuring the effects of signal latency on real users is now about 250 years old. We now call this Real User Measurement, or RUM for short, and it’s come a long way since Steve Souders’ early work at Yahoo.

Browsers now provide us with many APIs to fetch performance metrics that help site owners make sites faster. Concurrently, the Core Web Vitals initiative from Google helps identify metrics that most affect the user experience.

These metrics, while useful operationally, don’t give us a clear picture of the user experience, or why we need to optimise them for our site in particular. They don’t answer the business or human questions of, “Why should we invest in web performance?” (v/s for example, a feature that customers really want), or even more specifically, “What should we work on first?”.

Andy Davies recently published a post about the link between site speed and business outcomes…

Context influences experience,
Experience influences behaviour,
Behaviour influences business outcomes.

All of the metrics we collect and optimise for deal with context, and we spend very little time measuring and optimising the rest of the flow.

Switching Hats

Over the last decade working on boomerang and mPulse, we slowly came to the realisation that we’ve been approaching performance metrics from a developer centric view. We’d been drawing on our experience as developers – users who have browser dev tools shortcuts committed to muscle memory. We were measuring and optimising the metrics that were useful and easy to collect from a developer’s point of view.

Once we switched hats to draw on our experiences as consumers of the web, the metrics that really matter became clearer. We started asking better questions...

  • What does it mean that performance improved by 100ms?
  • Are all 100ms the same?
  • Do all users perceive time the same way?
  • Is performance all that matters?

In this post, we’ll talk about measuring user experience and its effects on behaviour, what we can infer from that behaviour, and how it affects business outcomes.

Delight & Frustration

In Group Psychology and the Analysis of Ego, Freud notes that “Frustration occurs when there is an inhibiting condition that interferes with or stops the realization of a goal.”

Users visit our sites to accomplish a goal. Perhaps they’re doing research to act on later, perhaps they want to buy something, perhaps they’re looking to share an article they read a few days ago.

Anything that slows down or prevents the user from accomplishing this goal can cause frustration. On the other hand, making their goal easy to find and achieve can be delightful.

How a user feels when using our site affects whether they’ll come back and “convert” into customers (however you may define convert).

The Link Between Latency & Frustration

In 2013, Tammy Everts and her team at Radware ran a usability lab experiment. The study hooked participants up to EEG devices, and asked them to shop on certain websites. Half the users had an artificial delay added to their browsing experience and neither group were made aware of the performance changes. They all believed they were testing the usability of the sites. The study showed that...

A 500ms connection speed delay resulted in up to a 26% increase in peak frustration and up to an 8% decrease in engagement.

Similarly in 2015, Ericsson ConsumerLab neuro research studied the effects of delayed web pages on mobile users and found that “Delayed web pages caused a 38% rise in mobile users' heart rates — equivalent to the anxiety of watching a horror movie alone.”

This may not be everyone’s cup of tea, and the real implication is that users make a conscious or unconscious decision on whether to stick around, return, or leave the site.

Cognitive Bias

Various cognitive biases affect how individual experiences affect perception and behaviour. Understanding these biases, and intervening when an experience tends negative can improve the overall experience.

Perceptual Dissonance

Also known as Sensory Dissonance, Perceptual Dissonance results from unexpected outcomes of common actions.

The brain’s predictive coding is what helps you do things like “figure out if a car coming down the road is going slow enough for you to cross safely”. A perceptive violation of this coding is useful in that it helps us learn new things, but if that violation breaks long standing “truths”, or if violations are inconsistent, it makes learning impossible, and leads to psychological stress, and frustration.

On the web, users expect websites to behave in a certain way. Links should be clickable, sites should in general scroll vertically, etc. Things like jank while scrolling, nothing happening when a user clicks a link (dead clicks), or a click target moving as the user attempts to click on it (layout shift) causes perceptual dissonance and frustration.

If these bad experiences are consistent, then users come to expect them. Our data shows that users from geographies where the internet is slower than average tend to be more patient with web page loads.

Survivorship Bias

We only measure users who can reach our site. For some users, a very slow experience is better than an unreachable site.

In 2012, after Youtube made their site lighter, Chris Zakariahs found that aggregate performance had gotten worse. On delving into the data, they found that new users who were previously unable to access the site were now coming in at the long tail. The site appeared slower in aggregate, but the number of users who could use it had gone up.

Negativity Bias

Users are more likely to remember and talk to their friends about their bad experiences with a site than they are about the good ones. We need only run a twitter search for “$BRAND_NAME slow” to see complaints about bad experiences.

Bad experiences are also perceived to be far more intense than equivalent good experiences. To end up with a neutral overall experience, bad experiences need to be balanced with more intense good experiences. A single bad experience over the course of the session makes it harder to result in overall delight.

Active Listening

Research shows that practicing Active Listening can have a big impact on countering Negativity Bias. Simply acknowledging when you’ve screwed up and didn’t meet the user’s expectations can alleviate negative perception. If we detect, via JavaScript, that the page is taking too long to transition between loading states, we could perhaps display a message that acknowledges and apologizes for things going slower than expected.

Hey, we realise that it’s taking a little longer than expected to get to what you want. You deserve better. We’re sorry and hope you’ll stick around a bit.

Users will be more forgiving if their pain is acknowledged.

Measuring Emotion

There are many ways we could measure the emotional state of users using our site. These range from active engagement to completely creepy. Naturally not all of these will be applicable for websites...

  • Use affective computing (facial analysis, EEGs, pulse tracking, etc.)
  • Ask the user via a survey popover
  • Business outcomes of behaviour
  • Behavioural analysis
Affective Computing

For website owners, affective computing isn’t really in play. Things like eye tracking, wireless brain interfaces, and other affective computing methodologies are too intrusive. They work well in a lab environment where users consent to this kind of tracking and can be hooked up to measurement devices. This is both inconvenient, and creepy to run on the web.

Ask the user

Asking the user can be effective as shown by a recent study from Wikipedia. The study used a very simple Yes/No/No Comment style dialog with randomized order. They found that users’ perceived quality of experience is inversely proportional to median load time. A 4% temporary improvement to page load time resulted in an equally temporary 1% extra satisfied users.

Area chart of two timeseries: Median loadEventEnd, and Satisfaction Ratio (positive/total). Time axis covers 1 year from Oct 2019 to Oct 2020. More details in the text preceding this image.

This method requires active engagement by the user and suffers from selection bias and the hawthorne effect.

It’s hard to quantify what kinds of experiences would reduce the effects of selection bias and result in users choosing to answer the survey, or how you’d want to design the popover to increase self-selection.

The Hawthorne effect, on the other hand, suggests that individuals change the way they react to stimuli if they know they’re being measured or observed.

Business Outcomes

Measuring business outcomes is necessary but it can be hard to identify what context resulted in an outcome. One needs to first understand the intermediate steps of experience and behaviour. Did a user bounce because the experience was bad, or did they just drop in to do some research and will return later to complete a purchase?

Behavioural analysis

Applying the results of lab based research to users actively using a website can help tie experience to behaviour. We first need to introduce some new terms that we’ll define in the paragraphs that follow.

Rage Clicks, Wild Mouse, Scrandom, and Backtracking are behavioural signals we can use. In conjunction with when in a page’s life cycle users typically expect different events to take place, they can paint a picture of user expectations and behaviour.

Correlating these metrics with contextual metrics like Core Web Vitals on one hand, and business outcomes on the other can help us tell a more complete story of which performance metrics we should care about and why.

Rage, Frustration & Confusion

To measure Rage, Frustration & Confusion, we look at Rage Clicks, Wild Mouse and Backtracking.

Rage Clicks

Rage Clicks occur when users rapid-fire click on your site. It is the digital equivalent of cursing to release frustration. We’ve probably all caught ourselves rage clicking at some point. Click once, nothing happens, click again, still nothing, and then on and on. This could be a result of interaction delays, or of users expecting something to be clickable when it isn't.

Rage clicks can be measured easily and non-intrusively, and are easy to analyse.

Fullstory has some great resources around Rage Clicks.

Wild Mouse

Research shows that people who are angry are more likely to use the mouse in a jerky and sudden, but surprisingly slow fashion.

People who feel frustrated, confused or sad are less precise in their mouse movements and move it at different speeds.

There are several expected mouse movements while a user traverses a website. Horizontal and vertical reading patterns are expected and suggest that the user is engaged in your content.

On the other hand, random patterns, or jumping between options in a form can suggest confusion, doubt, and frustration. See Churruca, 2011 for the full study.

The JavaScript library Dawdle.js can help classify these mouse patterns.

Scrandom

Scrandom is the act of randomly scrolling the page up and down with no particular scroll target. This can indicate that a user is unsure of the content, the page is too long, or is waiting for something to happen and making sure that the page is still responsive without accidentally clicking anything.

Backtracking

Backtracking is the process of hitting the back button on the web. Users who are confused or lost on your site may hit the back button often to get back to a safe space. This behaviour may manifest itself in different ways, but can often be identified with very long sessions that appear to loop.

Tie this into the Page Load Timeline

In his post on Web Page Usability, Addy Osmani states that loading a page is a progressive journey with four key moments to it: Is it happening? Is it useful? Is it usable? and Is it delightful? And he includes this handy graphic to explain it:

When did the user feel they could interact? When could they interact? Speed metrics illustrate First Paint, First Contentful Paint, Time to Interactive for a page

The first three are fairly objective. With only minor differences between browsers, it’s straightforward to pull this information out of standard APIs, and possibly supplement it with custom APIs like User Timing.

We’ve found that over 65% of users expect a site to be usable after elements have started becoming visible but before it is actually Interactive. Contrast that with 30% who will wait until after the onload event has fired.

Correlating Rage with Loading Events

Comparing the points in time when users rage click with the loading timeline above, we see some patterns.

Relative time series showing the intensity of rage clicks tied to when users first interact with a page relative to page load. We also include the First Input Delay as a separate series, and show 25th-75th percentile bands for the First Paint, Largest Contentful Paint, Visually Ready, and Interactive times relative to Page Load.
The horizontal axis on this chart is time as a relative percent of the full page load time. -50 indicates half of the page load time while +50 is 1.5x the page load time. The vertical axis indicates intensity of rage while point radius indicates probability of rage clicks at that time point. The coloured bars indicate 25th to 75th percentile ranges for the particular timer relative to full page load with the line going through indicating the median.

We see a large amount of rage between content becoming visible and the page becoming interactive. Users expect to be able to interact with the page soon after content becomes visible, and if that expectation isn’t met, it results in rage clicking.

We also see a small stream of rage clicks after the page has completed loading, caused by interaction delays.

There’s a small gap just before the onload event fires. The onload event is when many JavaScript event handlers run, which in turn result in Long Tasks, and increased Interaction Delays. What we’re seeing here is not the absence of any interaction, but survivorship bias where the interactions that happen at that time aren’t captured until later.

The horizontal axis on this chart is relative time along the page load timeline. We looked at various combinations of absolute and relative time across multiple timers, and it was clear that relativity is a stronger model, which brings us to a new metric based on relative timers...

Frustration Index

The frustration index, developed by Tim Vereecke, is a measure based on the relation between loading phases. We’ve seen that once one event occurs, users expect the next to happen within a certain amount of time. If we miss that expectation, the user's perception is that something is stopping or inhibiting their ability to complete their task, resulting in frustration.

The Frustration Index encapsulates that relationship. The formula we use is constantly under development as research brings new things to light, but it’s helpful to visit the website to understand exactly how it works and see some examples.

So how do we know that this is a good metric to study?

Correlating Rage & Frustration

It turns out that there is a strong correlation (ρ=0.91) between the intensity of rage (vertical axis) that a user expresses and the calculated frustration index (horizontal axis) of the page.

Scatter Plot showing Frustration Index on the horizontal axis and intensity of rage clicks on the vertical axis. The two variables have a pearson's correlation coefficient of 0.91.

Rather than looking at individual timers for optimization, it is better to consider all timers in cohesion. Improving one of them changes the user’s expectation of when other events should happen and missing that expectation results in frustration.

However, further to this, the formula is something we can apply client-side to determine if we’re meeting expectations, and practice active listening if we’re not.

Correlating Frustration & Business Outcomes

Looking at the correlation between Frustration Index and certain business metrics also shows a pattern.

Double Scatter Plot showing Frustration Index on the horizontal axis and bounce rate on the first vertical axis and average session duration in minutes on the second.
  • Bounce Rate is proportional to the frustration index with a sharp incline around what we call the LD50 point (for this particular site). ρb=0.65
  • Average Time spent on the site goes down as frustration increases, again sharply at first and then tapering off. ρt=-0.49
LD50

The LD50, or Median Lethal Dose is a term borrowed from the biological sciences. Buddy Brewer first applied the term to web performance in 2012, and we’ve been using it ever since.

In biology, it’s the dosage of a toxin that kills off 50% of the sample, be it tumour cells, or mice.

On the web, we think of it more in terms of when 50% of users decide not to move on in their journey. We could apply it to bounce rate, or retention rate, or any other rate that’s important to your site, and the “dose”, may be a timer value, or frustration index, or anything else. Depending on the range of the metric in question, we may also use a percentile other than the median, for example, LD25 or LD75.

This isn’t a single magic number that works for all websites. It isn’t even a single number that works for all pages on a site or for all users. Different pages and sites have different levels of importance to a user, and a user’s emotional state, or even the state of their device (eg: low battery), when they visit your site can affect how patient they are.

Column chart showing the LD25 frustration index value for users from different Geos: US:26, Germany:10, Japan:18, Australia:42, Canada:44.
Patience is also a Cultural Thing

People from different parts of the world have a different threshold for frustration.

Many of our customers have international audiences and they have separate sites customized for each locale. We find that users from different global regions have different expectations of how fast a site should be.

In this chart, looking at 5 high GDP countries (that we have data for), we see a wide distribution in LD25 value across them, ranging from a value of 10 for Germany to the 40s for Australia and Canada. It’s not shown in this chart, but the difference is even wider when we look at LD50, with Germany at 14 and Canada at 100.

So how fast should our site be?

We’ve heard a lot about how our site’s performance affects the user experience, and consequently how people feel when using our site. We’ve seen how the “feel” of a site can affect the business, but what does all of that tell us about how to build our sites?

  • How fast should we be to reduce frustration?
  • What should we be considering in our performance budgets?
  • How do we leave our users feeling happy?

I think these may be secondary questions…

A better question to start with, is:

Will adding a new feature delight or frustrate the user?

References

Acknowledgements

Thanks to Andy Davies, Nic Jansma, Paul Calvano, Tim Vereecke, and Cliff Crocker for feedback on an earlier draft of this post.

Thanks also to the innumerable practitioners whose research I've built upon to get here including Addy Osmani, Andy Davies, Gilles Dubuc, Lara Hogan, Nicole Sullivan, Silvana Churruca, Simon Hearne, Tammy Everts, Tim Kadlec, Tim Vereecke, the folks from Fullstory, and many others that I'm sure I've missed.

Wednesday, May 27, 2015

Velocity Santa Clara 2015 -- My List

At Velocity SC 2015, these are the talks that I'd really like to see if only I could be in more than one room at a time.

Wednesday, May 27

09:00 Service Workers by Pat Meenan

Exciting new technology currently available in Chrome

11:00 Building performant SPAs by Chris Love

Since we're doing a lot of SPA work for boomerang at the moment, I'm very interested in performance best practices for SPAs

13:30 Metrics Metrics Everywhere by Tammy & Cliff

Tammy and Cliff are my colleagues at SOASTA, and this talk is based on a lot of the data that I've been working to collect over the last few years. I'm torn between this and the next one also at the same time.

13:30 Self-healing systems by Todd & Matt

Todd & Matt are also colleagues at SOASTA, and this talk is about the infrastructure we've developed to collect the metrics that are covered in the talk that Tammy & Cliff are doing. I really wish I could be at both.

15:30 Linux Perf Tools by Brendan Gregg

Always interested in tools to analyse linux performance.

Thursday, May 28

I haven't listed the keynotes here because that's the only track at the time and I don't need to choose which room to be in.

13:45 LinkedIn's use of RUM by Ritesh Maheshwari

LinkedIn uses a modified version of boomerang, and I'm keen to know what they've done.

13:45 Stream processing and anomaly detection by Arun Kejriwal

Very interesting topic, something that I'm very interested in.

13:45 Design & Performance by Steve Souders

Steve's talks are always educational

14:40 Visualising Performance Data by Mark Zeman

Again, this is something I'm working on at the moment, so very interested.

14:40 Failure is an Option by Ian Malpass

Etsy's devops talks are always educational.

16:10 Crafting performance alerting tools by Allison McKnight

I'm very interested in crafting alerts from RUM data.

Friday, May 29

09:00 RUM at MSN by Paul Roy

14:25 Missing Bandwidth by Bill Green

14:25 Winning Arguments with Performance Data by Buddy Brewer

17:05 All talks at this time slot

This last slot is unfortunate. Every talk at this slot is interesting and by good speakers.

Thursday, January 08, 2015

IE throws an "Invalid Calling Object" Exception for certain iframes

On a site that uses boomerang, I found a particular JavaScript error happen very often:
TypeError: Invalid calling object
This only happens on Internet Explorer, primarily IE 11, but I've seen it on versions as old as 9. I searched through stack overflow for the cause of this error, and while many of the cases sounded like they could be my problem, further investigation showed that my case didn't match any of them. The code in particular that threw the exception was collecting resource timing information for all resources on the page. Part of the algorithm involves drilling into iframes on the page, and this error showed up on one particular iframe. There are a few things to note:
   ("performance" in frame) === true;

   frame.hasOwnProperty("performance") === false;
The latter is not a surprise since hasOwnProperty("performance") is not supported for window objects on IE (I've seen this before when investigating JSLint problems.) There was no problem accessing frame.document, but accessing frame.performance threw an exception.
    frame.performance;    // <-- throws "TypeError: Invalid calling object" with error code -2147418113

    frame["performance"]; // <-- throws "TypeError: Invalid calling object" with error code -2147418113
In fact, frame.<anything except document> would throw the same exception. So I looked at the iframe's document object some more, and found this:
    frame.document.pathname === "/xxx/yyy/123/4323.pdf";
The frame was pointing to a PDF document, and while IE was creating a reference to hold the performance object of this document, it prevented any attempts to access this reference. I tested Chrome and Firefox, and they both create and populate a frame.performance object for PDF documents.

Friday, August 22, 2014

jslint's suggestion will break your site: Unexpected 'in'...

I use jslint to validate my JavaScript before it goes out to production. The tool is somewhat useful, but you really have to spend some time ignoring all the false errors it flags. In some cases you can take its suggestions, while in others you can ignore them with no ill effects.

In this particular case, I came across an error, where, if you follow the suggestions, your site will break.

My code looks like this:

   if (!("performance" in window) || !window.performance) {
      return null;
   }

jslint complains saying:

Unexpected 'in'. Compare with undefined, or use the hasOwnProperty method instead.

This is very bad advice for the following reasons:

  • Comparing with undefined will throw an exception on Firefox 31 if used inside an anonymous iframe.
  • Using hasOwnProperty will cause a false negative on IE 10 because window.hasOwnProperty("performance") is false even though IE supports the performance timing object.

So, the only course of action, is to use in for this case.

Wednesday, December 12, 2012

A correlation between load time and usage

We frequently see reports of website usage going up as load time goes down, or vice-versa. It seems logical. Users use a site much more if it's fast, and less if it's slow. However, consider the converse too. Is it possible that a site merely appears to be faster because users are using it more, and therefore have more of it cached? I've seen sites where the server-side cache-hit ratio is much higher when usage is high resulting in lower latency. At this point I haven't seen any data that can convince me one way or the other. Do you?

Monday, August 13, 2012

Analyzing Performance Data with Statistics

At LogNormal, we’re all about collecting and making sense of real user performance data. We collect over a billion data points a month, and there’s a lot you can tell about the web and your users if you know what to look at in your data. In this post, I’d like to go over some of the statistical methods we use to make sense of this data.

You’d use these methods if you wanted to build your own Real User Measurement (RUM) tool.

The entire topic is much larger than I can cover in a single post, so go through the references if you’re interested in more information.

Tuesday, September 06, 2011

The Limits of Network Load Testing – Ephemeral Ports

I've been trying to run load tests against a node.js web app that I'm working on. I've hacked up my app to log (every 5 seconds) the number of requests per second that it handles and the amount of memory that it uses. I saw some strange numbers. I should also mention that I was using ab to throw load at the server. Perhaps not the best load tester around, but it served my purpose for this particular test.

My app started out well, handling about 1500 requests per second, but then all of a sudden it would stop serving requests. With repeated runs, the point where it would stop was always around 16K connections... in fact, it always hovered around 16384, mostly staying below, but sometimes going above it a bit (eg: 16400).

Ephemeral Ports

To those of you familiar with TCP networking, you'll recognise that value as the IANA specified number of Ephemeral Ports (49152 to 65535).

I started to log the state of all TCP connections every second using the following shell script (split onto multiple lines for readability):

while true; do
   echo -n "$( date +%s ) ";
   netstat | \
      awk '$4~/\.http-alt/ { conn[$6]++ }
           END {
              for(c in conn) {
                 printf("%s: %d\t", c, conn[c]);
              } 
              printf("\n");
           }
      ';
   sleep 1;
done
This is the output it returned:
1315341300 TIME_WAIT: 209	ESTABLISHED: 100	
1315341301 FIN_WAIT_1: 4	FIN_WAIT_2: 4	TIME_WAIT: 1892	ESTABLISHED: 92	
1315341302 FIN_WAIT_1: 1	FIN_WAIT_2: 2	TIME_WAIT: 3725	ESTABLISHED: 97	
1315341303 FIN_WAIT_2: 2	TIME_WAIT: 5426	ESTABLISHED: 97	
1315341304 FIN_WAIT_1: 1	FIN_WAIT_2: 5	TIME_WAIT: 7017	ESTABLISHED: 94	
1315341305 TIME_WAIT: 8722	ESTABLISHED: 100	
1315341306 FIN_WAIT_1: 2	TIME_WAIT: 10459	ESTABLISHED: 98	
1315341308 FIN_WAIT_1: 3	FIN_WAIT_2: 3	TIME_WAIT: 12246	ESTABLISHED: 94	
1315341309 FIN_WAIT_1: 7	TIME_WAIT: 14031	ESTABLISHED: 93	
1315341310 FIN_WAIT_1: 3	TIME_WAIT: 15937	ESTABLISHED: 97	
1315341311 TIME_WAIT: 16363	
1315341312 TIME_WAIT: 16363	
1315341314 TIME_WAIT: 16363	
1315341315 TIME_WAIT: 16363	
1315341316 TIME_WAIT: 16363	
1315341317 TIME_WAIT: 16363	
1315341318 TIME_WAIT: 16363	
1315341319 TIME_WAIT: 16363	
1315341321 TIME_WAIT: 16363	
1315341322 TIME_WAIT: 16363	
1315341323 TIME_WAIT: 16363	
1315341324 TIME_WAIT: 16363	
1315341325 TIME_WAIT: 16363	
1315341326 TIME_WAIT: 16363	
1315341328 TIME_WAIT: 16363	
1315341329 TIME_WAIT: 16363	
1315341330 TIME_WAIT: 16363	
1315341331 TIME_WAIT: 14321	
1315341332 TIME_WAIT: 12641	
1315341333 TIME_WAIT: 11024	
1315341334 TIME_WAIT: 9621	
1315341336 TIME_WAIT: 7516	
1315341337 TIME_WAIT: 5920	
1315341338 TIME_WAIT: 4227	
1315341339 TIME_WAIT: 2693	
1315341340 TIME_WAIT: 1108	
1315341341 TIME_WAIT: 23
There are a few things to note here.
  1. I ran ab with concurrency set to 100, so the number of connections in the ESTABLISHED, and FIN_WAIT_{1,2} states should be about 100 for the most part
  2. I was polling netstat once per second + the amount of time it took netstat to return, so you might notice missing seconds in between
  3. Requests were being handled at over 1000 per second, so many of the connections would have gone from the SYN_SENT state to TIME_WAIT in less than 1 second.

It's pretty clear here that everything just stops once the number of connections reaches 16363, and my app's logs confirm that. ab ends up timing out somewhere near the end of this report (which should probably be the subject of a different blog post).

Now if we look closely, we notice that it takes 30 seconds from the first result we see to when the number of connections in the TIME_WAIT state start dropping below 16363

TIME_WAIT

A few points should be noted about the TIME_WAIT state.
  1. Only the endpoint that sends the first FIN will end up in a TIME_WAIT state. I'll tell you later why this is important
  2. A connection remains in the TIME_WAIT state for twice the connection's MSL (See RFC793)

So we now have a whole load of TCP connections in the TIME_WAIT state and the client (the load tester) has run out of ephemeral ports to keep running the test. This is independent of the load tester used, and how it handles its sockets (poll, select, multiple threads, multiple processes, whatever). This is a limit imposed by the Operating System in order to comply with the TCP spec, and in general make sure the internet doesn't break.

Now there are various ways to get around this limit. The simplest is to just use multiple boxes to run your performance test all simultaneously hitting your server. The more boxes you use, the more likely it is that your server will break down before your client hits the port limit. But what if you don't have access to so many boxes (in a world without EC2 for example), or it's too expensive, or you're not utilising the other resources (CPU/RAM/Network Interface) on your client boxes and would like to get a little more out of them?

You could program your load tester to use SO_REUSEADDR, which would allow it to reuse sockets that are in the TIME_WAIT state, but that could backfire if there's still data on the network.

MSL

Now remember that TIME_WAIT is twice the MSL. RFC793 recommends setting the MSL to 2 minutes, which would leave TIME_WAIT at 4 minutes. Luckily no current implementation uses this value. Linux has its TIME_WAIT length hardcoded to 60 seconds (implying a 30 second MSL), while BSD has set to 15 seconds (sysctl net.inet.tcp.msl) and tunable at run time.

To verify, I reran the test after setting the MSL to 5 seconds:

sudo sysctl -w net.inet.tcp.msl=5000

And these were the results:

1315344661 TIME_WAIT: 1451	ESTABLISHED: 100	
1315344662 FIN_WAIT_2: 5	TIME_WAIT: 3272	ESTABLISHED: 95	
1315344664 FIN_WAIT_1: 1	TIME_WAIT: 5010	ESTABLISHED: 99	
1315344665 TIME_WAIT: 6574	ESTABLISHED: 100	
1315344666 FIN_WAIT_1: 11	FIN_WAIT_2: 2	TIME_WAIT: 7908	ESTABLISHED: 87	
1315344667 TIME_WAIT: 9689	ESTABLISHED: 100	
1315344668 TIME_WAIT: 11155	ESTABLISHED: 100	
1315344669 FIN_WAIT_1: 3	TIME_WAIT: 12522	ESTABLISHED: 97	
1315344671 FIN_WAIT_2: 2	TIME_WAIT: 13655	ESTABLISHED: 98	
1315344672 FIN_WAIT_1: 4	FIN_WAIT_2: 12	TIME_WAIT: 13847	ESTABLISHED: 84	
1315344673 FIN_WAIT_1: 2	TIME_WAIT: 13218	ESTABLISHED: 98	
1315344674 FIN_WAIT_1: 2	TIME_WAIT: 13723	ESTABLISHED: 97	
1315344675 FIN_WAIT_1: 2	TIME_WAIT: 14441	ESTABLISHED: 98	
1315344677 FIN_WAIT_1: 4	FIN_WAIT_2: 6	TIME_WAIT: 13946	ESTABLISHED: 90	
1315344678 FIN_WAIT_1: 3	FIN_WAIT_2: 15	TIME_WAIT: 14670	ESTABLISHED: 82	
1315344679 FIN_WAIT_1: 2	TIME_WAIT: 15164	ESTABLISHED: 98	
1315344680 FIN_WAIT_1: 2	TIME_WAIT: 15062	ESTABLISHED: 98	
1315344681 TIME_WAIT: 15822	ESTABLISHED: 100	
1315344683 FIN_WAIT_1: 4	TIME_WAIT: 15855	ESTABLISHED: 82	
1315344684 FIN_WAIT_1: 15	TIME_WAIT: 15506	ESTABLISHED: 84	
1315344685 FIN_WAIT_2: 9	TIME_WAIT: 15928	ESTABLISHED: 91	
1315344687 FIN_WAIT_1: 4	TIME_WAIT: 15356	ESTABLISHED: 96	
1315344688 FIN_WAIT_1: 2	FIN_WAIT_2: 8	TIME_WAIT: 15490	ESTABLISHED: 90	
1315344689 FIN_WAIT_1: 1	TIME_WAIT: 15449	ESTABLISHED: 99	
1315344690 FIN_WAIT_1: 3	TIME_WAIT: 15801	ESTABLISHED: 97	
1315344692 FIN_WAIT_1: 1	TIME_WAIT: 15882	ESTABLISHED: 70	
1315344693 FIN_WAIT_1: 3	TIME_WAIT: 16106	ESTABLISHED: 56	
1315344694 FIN_WAIT_1: 3	TIME_WAIT: 15637	ESTABLISHED: 2	
1315344695 TIME_WAIT: 14166	
1315344697 TIME_WAIT: 12588	
1315344698 TIME_WAIT: 10454	
1315344699 TIME_WAIT: 8917	
1315344700 TIME_WAIT: 7441	
1315344701 TIME_WAIT: 5735	
1315344702 TIME_WAIT: 4119	
1315344703 TIME_WAIT: 1815	
1315344704 TIME_WAIT: 88	

The test runs longer here because it actually gets to 50000 connections. We're doing about 1500 requests per second, so it would take just under 11 seconds to exhaust all 16384 ports, but since ports are reusable in 10 seconds, we would have reclaimed the first 3000 ports in this time and then just continue to use them.

Here's the catch though, you can only do this on BSD. Linux doesn't let you change the MSL. Also, DO NOT DO THIS IN PRODUCTION!

TIME_WAIT, FIN and HTTP keep-alive

Now I mentioned earlier that only the endpoint that sends the first FIN packet will enter the TIME_WAIT state. This is important because in production, you really do not want your server to end up with a lot of connections in the TIME_WAIT state. A malicious user could, for example, make a large number of HEAD requests to your server, and either wait for them to terminate, or specify Connection:close but leave the connection open for your server to close by sending the first FIN packet.

How do you deal with this?

Friday, June 17, 2011

NavigationTiming, IPv6, Instant Pages and other goodies in boomerang

There have been many additions to boomerang over the last few months. Among them include:
  • Full navigation timing API support added by Buddy Brewer (@bbrewer)
  • Measure IPv6 support and latency with the ipv6 plugin
  • Measure the performance of "Instant Pages" in Google Chrome (v13+)

NavigationTiming

With the navtiming.js plugin, boomerang will beacon back the entire navigation timing object for browsers that support it (Chrome 6+ and IE 9+ and Firefox 6+ at this time). Use this information to gain far more insight into the user's browsing experience.

Docs here: http://yahoo.github.com/boomerang/doc/api/navtiming.html and here: http://yahoo.github.com/boomerang/doc/howtos/howto-9.html

IPv6

World IPv6 day is past, but IPv6 support still isn't universal. The ipv6.js plugin will help you find out if your users can access content over IPv6 and how that latency compares to whatever you currently provide. This does require server side setup though.

Docs here: http://yahoo.github.com/boomerang/doc/api/ipv6.html

Instant Pages

Google Chrome 13 has taken Mozilla's prefetch technology to the next level, introducing "prerender" -- a method to download and render the most likely next page before the user clicks on it, giving the appearance of the page loading Instantly when the user does click.

Boomerang is now prerender aware, and will check to see if a page was loaded through prerender or not. If it was, then boomerang measures a few more interesting times like the actual load time, the perceived load time, and the time from prerender completing to the page becoming visible.

Caveat: The measurement code causes current builds of Chrome 13 to crash. This bug appears to have been fixed in the nightlies (Chrome 14 Canary).

Docs here: http://yahoo.github.com/boomerang/doc/howtos/howto-10-page%231.html

Monday, December 27, 2010

Using bandwidth to mitigate latency

[This post is mirrored from the Performance Advent Calendar]

The craft of web performance has come a long way from yslow and the first few performance best practices. Engineers the web over have made the web faster and we have newer tools, many more guidelines, much better browser support and a few dirty tricks to make our users' browsing experience as smooth and fast as possible. So how much further can we go?

Physics

The speed of light has a fixed upper limit, though depending on the medium it passes through, it might be lower. In fibre, this is about 200,000 Km/s, and it's about the same for electricity through copper. This means that a signal sent over a cable that runs 7,000Km from New York to the UK would take about 35ms to get through. Channel capacity (the bit rate), is also limited by physics, and it's Shannon's Law that comes into play here.

This, of course, is the physical layer of our network. We have a few layers above that before we get to TCP, which does most of the dirty work to make sure that all the data that your application sends out actually gets to your client in the right order. And this is where it gets interesting.

TCP

Éric Daspet's article on latency includes an excellent discussion of how slow start and congestion control affect the throughput of a network connection, which is why google have been experimenting with an increased TCP initial window size and want to turn it into a standard. Each network roundtrip is limited by how long it takes photons or electrons to get through, and anything we can do to reduce the number of roundtrips should reduce total page download time, right? Well, it may not be that simple. We only really care about roundtrips that run end-to-end. Those that run in parallel need to be paid for only once.

When thinking about latency, we should remember that this is not a problem that has shown up in the last 4 or 5 years, or even with the creation of the Internet. Latency has been a problem whenever signals have had to be transmitted over a distance. Whether it is a rider on a horse, a signal fire (which incidentally has lower latency than light through fibre[1]), a carrier pigeon or electrons running through metal, each has had its own problems with latency, and these are solved problems.

C-P-P

There are three primary ways to mitigate latency. Cache, parallelise and predict[2]. Caching reduces latency by bringing data as close as possible to where it's needed. We have multiple levels of cache including the browser's cache, ISP cache, a CDN and front-facing reverse proxies, and anyone interested in web performance already makes good use of these. Prediction is something that's gaining popularity, and Stoyan has written a lot about it. By pre-fetching expected content, we mitigate the effect of latency by paying for it in advance. Parallelism is what I'm interested in at the moment.

Multi-lane highways

Mike Belshe's research shows that bandwidth doesn't matter much, but what interests me most is that we aren't exploiting all of this unused channel capacity. Newer browsers do a pretty good job of downloading resources in parallel, and with a few exceptions (I'm looking at you Opera), can download all kinds of resources in parallel with each other. This is a huge change from just 4 years ago. However, are we, as web page developers, building pages that can take advantage of this parallelism? Is it possible for us to determine the best combination of resources on our page to reduce the effects of network latency? We've spent a lot of time, and done a good job combining our JavaScript, CSS and decorative images into individual files, but is that really the best solution for all kinds of browsers and network connections? Can we mathematically determine the best page layout for a given browser and network characteristics[3]?

Splitting the combinative

HTTP Pipelining could improve throughput, but given that most HTTP proxies have broken support for pipelining, it could also result in broken user experiences. Can we parallelise by using the network the way it works today? For a high capacity network channel with low throughput due to latency, perhaps it makes better sense to open multiple TCP connections and download more resources in parallel. For example, consider these two pages I've created using Cuzillion:
  1. Single JavaScript that takes 8 seconds to load
  2. 4 JavaScript files that take between 1 and 3 seconds each to load for a combined 8 second load time.
Have a look at the page downloads using FireBug's Net Panel to see what's actually happening. In all modern browsers other than Opera, the second page should load faster whereas in older browsers and in Opera 10, the first page should load faster.

Instead of combining JavaScript and CSS, split them into multiple files. How many depends on the browser and network characteristics. The number of parallel connections could start of based on the ratio of capacity to throughput and would reduce as network utilisation improved through larger window sizes over persistent connections. We're still using only one domain name, so no additional DNS lookup needs to be done. The only unknown is the channel capacity, but based on the source IP address and a geo lookup[4] or subnet to ISP map, we could make a good guess. Boomerang already measures latency and throughput of a network connection, and the data gathered can be used to make statistically sound guesses.

I'm not sure if there will be any improvements or if the work required to determine the optimal page organisation will be worth it, but I do think it's worth more study. What do you think?

Footnotes

  1. Signal fires (or even smoke signals) travel at the speed of light in air v/s light through fibre, however the switching time for signal fires is far slower, and you're limited to line of sight.
  2. David A. Patterson. 2004. Latency lags bandwith[PDF]. Commun. ACM 47, 10 (October 2004), 71-75.
  3. I've previously written about my preliminary thoughts on the mathematical model.
  4. CableMap.info has good data on the capacity and latency of various backbone cables.

Wednesday, August 25, 2010

An equation to predict a page's roundtrip time


Note that I'm using MathJax to render the equations on this post. It can take a while to render everything, so you may need to wait a bit before everything shows up. If you're reading this in a feed reader, then it will all look like gibberish (or LaTeX if you can tell the difference).

So I've been playing with this idea for a while now, and bounced it off YSlow dev Antonia Kwok a few times, and we came up with something that might work. The question was whether we could estimate a page's expected roundtrip time for a particular user just by looking at the page structure. The answer, is much longer than that, but tends towards possibly.

Let's break the problem down first. There are two large unknowns in there:
  • a particular user
  • the page's structure
These can be broken down into their atomic components, each of which is either known or measurable:
  • Network characteristics:
    • Bandwidth to the origin server ( \(B_O\) )
    • Bandwidth to your CDN ( \(B_C\) )
    • Latency to the origin server ( \(L_O\) )
    • Latency to your CDN ( \(L_C\) )
    • DNS latency to their local DNS server ( \(L_D\) )
  • Browser characteristics:
    • Number of parallel connections to a host ( \(N_{Hmax}\) )
    • Number of parallel connections overall ( \(N_{max}\) )
    • Number of DNS lookups it can do in parallel ( \(N_{Dmax}\) )
    • Ability to download scripts in parallel
    • Ability to download css in parallel (with each other and with scripts)
    • Ability to download images in parallel with scripts
  • Page characteristics:
    • Document size (\(S_O\) )
    • Size of each script (\(S_{S_i}\))
    • Size of each non-script resource (images, css, etc.) (\(S_{R_i}\))
    • Number of scripts ( \(N_S\))
    • Number of non-script resources (\(N_R\))
    • Number of hostnames (\(N_H\)), further broken down into:
      • Number of script hostnames (\(N_{SH}\))
      • Number of non-script hostnames (\(N_{RH}\))
All sizes are on the wire, so if a resource is sent across compressed, we consider the compressed size and not the uncompressed size. Additionally, scripts and resources within the page can be combined into groups based on the parallelisation factor of the browser in question. We use the terms \(SG_i\) and \(RG_i\) to identiffy these groups. We treat scripts and non-script resources differently because browsers treat them differently, ie, some browsers will not download scripts in parallel even if they download other resources in parallel.

To simplify the equation a bit, we assume that bandwidth and network latency from the user to the CDN and the origin are the same. Additionally, the latency for the main page includes both network latency and the time it takes the server to generate the page (\(L_S\)). Often this time can be significant, so we redefine the terms slightly:
\begin{align}
B_O & = B_C \\
L_O & = L_S + L_C
\end{align}

Browser characteristics are easy enough to obtain. Simply pull the data from BrowserScope's Network tab. It contains almost all the information we need. The only parameter not listed is the number of parallel DNS lookups that a browser can make. Since it's better to err on the side of caution, we assume that this number is 1, so for all further equations, assume \(N_{Dmax} = 1\).

Before I get to the equation, I should mention a few caveats. It's fairly naïve, assuming that all resources that can be downloaded in parallel will be downloaded in parallel, that there's no blank time between downloads, and that the measured bandwidth \(B_C\) is less than the actual channel capacity, therefore multiple parallel TCP connections will all have access to the full bandwidth. This is not entirely untrue for high bandwidth users, but it does breakdown when we get down to dial-up speeds. Here's the equation:
\[
T_{RT} = T_P + T_D + T_S + T_R
\]
Where:
\begin{align}
T_P \quad & = \quad L_O + \frac{S_O}{B_C}\\
\\
\\
\\
\\
\\
T_D \quad & = \quad \frac{N_H}{N_{Dmax}} \times L_D\\
\\
\\
\\
\\
\\
T_S \quad & = \quad \sum_{i=1}^{N_{SG}} \left( \frac{S_{SG_imax}}{B_C} + L_C \right) \\
\\
\\
\\
\\
\\
N_{SG} \quad & = \quad \left\{
\begin{array}{1 1}
\frac{N_S}{min \left( N_{Hmax} \times N_{SH}, N_{max} \right) } & \quad \text{if browser supports parallel scripts}\\
\\
\\
N_S & \quad \text{if browser does not support parallel scripts}\\
\end{array} \right. \\
\\
\\
\\
\\
\\
S_{SG_imax} \quad & = \quad \text{Size of the largest script in script group } SG_i\\
\\
\\
\\
\\
\\
T_R \quad & = \quad \sum_{i=1}^{N_{RG}} \left( \frac{S_{RG_imax}}{B_C} + L_C \right) \\
\\
\\
\\
\\
\\
N_{RG} \quad & = \quad \frac{N_R}{min \left( N_{Hmax} \times N_{RH}, N_{max} \right) }\\
\\
\\
\\
\\
\\
S_{RG_imax} \quad & = \quad \text{Size of the largest resource in resource group } RG_i
\end{align}

So this is how it works...

We assume that the main page's download time is a linear function of its size, bandwidth, the time it takes for the server to build the page and the network latency between the user and the server. While this is not correct (consider multiple flushes, bursty networks, and other factors), it is close.

We then consider all scripts in groups based on whether the browser can handle parallel script downloads or not. Script groups are populated based on the following algorithm:
for each script:
   if size of group > Nmax:
      process and empty group
   else if number of scripts in group for a given host > NHmax:
      ignore script for the current group, reconsider for next group
   else
      add script to group

process and empty group
If a browser cannot handle parallel scripts, then we just temporarily set \(N_{max}\) to 1.

Similarly, we consider the case for all non-script resources:
for each resource:
   if size of group > Nmax:
      process and empty group
   else if number of resources in group for a given host > NHmax:
      ignore resource for the current group, reconsider for next group
   else
      add resource to group

process and empty group

For DNS, we assume that all DNS lookups are done sequentially. This makes our equation fairly simple, but turns our result into an overestimate.

Overall, this gives us a fairly good guess at what the roundtrip time for the page would be, but it only works well for high bandwidth values.

We go wrong with our assumptions at a few places. For example, we don't consider the fact that resources may download in parallel with the page itself, or that when the smallest script/resource in a group has been downloaded, the browser can start downloading the next script/resource. We ignore the fact that some browsers can download scripts and resources in parallel, and we assume that the browser takes no time to actually execute scripts and render the page. These assumptions introduce an error into our calculations, however, we can overcome them in the lab. Since the primary purpose of this experiment is to determine the roundtrip time of a page without actually pushing it out to users, this isn't a bad thing.

So, where do we get our numbers from?

All browser characteristics come from BrowserScope.

The user's bandwidth is variable, so we leave that as a variable to be filled in by the developer running the test. We could simply select 5 or 6 bandwidth values that best represent our users based on the numbers we get from boomerang. Again, since this equation breaks down at low bandwidth values, we could simply ignore those.

The latency to our CDN is something we can either pull out of data that we've already gathered from boomerang, or something we can calculate with a simple and not terribly incorrect formula:
\[
L_C = 4 \times \frac{distance\left(U \leftrightarrow C\right)}{c_{fiber}}
\]
Where \(c_{fiber}\) is the speed of light in fiber, which is approximately \(2 \times 10^8 m/s\).

DNS latency is a tough number, but since most people are fairly close to their ISPs, we can assume that this number is between 40-80ms. The worst case is much higher than that, but on average, this should be correct.

The last number we need is \(L_S\), the time it takes for the server to generate the page. This is something that we can determine just by hitting our server from a nearby box, which is pretty much what we do during development. This brings us to the tool we use to do all the calculations.

YSlow already analyses a page's structure and looks at the time it takes to download each resource. We just pull the time out from what YSlow already has. YSlow also knows the size of all resources (both compressed and uncompressed), how many domains are in use and more. By sticking these calculations into YSlow, we could get a number that a developer can use during page development.

The number may not be spot on with what real users experience, but a developer should be able to compare two page designs and determine which of these will perform better even if they get the same YSlow score.

Naturally this isn't the end of the story. We've been going back and forth on this some more, and are tending towards more of a CPM approach to the problem. I'll write more about that when we've sorted it out.

For now, leave a comment letting me know what you think. Am I way off? Do I have the right idea? Can this be improved upon? Is this something you'd like to see in YSlow?

Thursday, July 29, 2010

Boomerang and the WebTiming API

I've been doing a lot of development on boomerang over the last few weeks. In particular around the WebTiming API. During Velocity this year, Microsoft announced that their latest release of IE9 supported this API, so I set about adding support to boomerang.

Since then, Google have announced support in Chrome as well, and there are rumours around that Mozilla plans on adding it to firefox. Now since the specification is still a draft, none of these browsers use the window.performance object, but instead use their own browser specific version. I've now added support for all these versions into boomerang.

More details in my post on the YDN blog.

You can get boomerang from github, and join the discussion on our forum.

Monday, July 12, 2010

4.01 Strict — SpeedFreak

Mild mannered Stoyan Stefanov makes an appearance in this week's 4.01 Strict.

Stoyan (the speedfreak) Stefanov saves the web

Monday, June 28, 2010

My velocity talk

So I was supposed to speak at Velocity 2010 this year. My talk was titled Latency: Why you should worry and what you can do about it. It was one of the last few talks at the conference. Unfortunately though, the thing about a conference like velocity is that almost everything is about latency, and by the time I went on stage, absolutely every point that I'd planned on making had already been covered.

That's not a bad thing. For one, it validates my ideas, telling me that many other smart people have been thinking about the same things. Secondly, each of these speakers had 60-90 minutes to convey their ideas, which meant they could really do it justice. It also shows that velocity is a pretty awesome conference to be at.

So, by the night of the 23rd, I realised that there was no way I could go ahead with my talk in its current state, and sat down to rewrite it. That's when the drama started.

Now my usual format is to build my slides using LaTeX and generate a PDF from that. I have a bunch of relevant images and charts in there, and it's not too hard to do that with LaTeX. I started going down that path, but realised that I wanted to do something different. I really wanted to reference all the other presentations and go over points that they'd covered, so I switched to Comic Life. I was up until 5am, working out the details and getting the script right.

The thing about a 12 minute talk is that you can't waste a single minute thinking about what you're going to say next. You just need to know it, and there wasn't much time for me to prepare. I'd also planned to announce boomerang in the latter part of my talk, so had to fit that in as well.

At 5 I realised that I was too sleepy to keep going, so I went to bed and figured I could finish things off in a couple of hours once I woke up. I was also too sleepy to backup the presentation to my server.

I woke up at 10, and opened up the mac to start work again, and it wouldn't come on. I could hear the fans whirring inside, and the startup sound would sound, but the screen wouldn't turn on. It was now officially time to panic.

Two things were in my favour though. I work at Yahoo!, and the conference was just down the road from Yahoo! HQ. I headed in to the office and took the mac down to IT. The guys there were really helpful. They opened up a new Mac, and transferred my hard disk to it. The new one started up perfectly, and it's what I'm using right now.

I spent the next two hours putting the finishing touches on, and adding details for the boomerang release, and then rushed off to the conference with a slice of pizza in my hand.

Once I got on stage, I plugged in the mac, and as it has done on numerous occasions in the past, it hung as soon as the VGA adaptor was plugged in. I had to reboot while on stage, and this was all coming out of the 12 minutes that I had.

The talk went off okay. Looking back at the video, there's a bunch of things I'd have done differently, and some more details that I'd have covered, but I'll let you decide for yourselves.

My slides are on slideshare, while the video is on blip.tv.

Saturday, June 26, 2010

boomerang

I've been silent on this blog for a while, and the main reason for that is boomerang. It's taken a few months to get it ready for a public release, and I'm quite pleased with the way it's come out.

As many of you know, I've worked with the Exceptional Performance team at Yahoo! for over two years, and during this time, I've done a lot of work with measuring and analysing the performance of all of Yahoo!'s websites (yes, all). There are several tens of thousands of page types, and probably hundreds of thousands of unique URLs that could show up on a yahoo.com domain. This throws up some very interesting problems that I'm not sure I'd have had the chance to work with anywhere else.

I learnt a lot about scaling cheap hardware to meet the needs of this system, and also a lot about how each of these thousands of pages worked to improve their performance.

The measurement tool that we use has turned into boomerang. It's primarily a beaconing framework, so isn't restricted to performance data, but the core plugins that come packaged with it only do performance measurement.

I'll write more in later posts about what you can do with boomerang, but for now, go get it from github.com/yahoo/boomerang. Fork it, use it, patch it, tell us what you think of it.

There's also documentation along with the source code, or you can have a look at my personal copy of them — frequently, but not automatically updated.

I announced it at Velocity 2010, and they made a video of my talk.

Friday, April 09, 2010

Analysis of the bandwidth and latency of YUIblog readers

A few months ago, Eric Miraglia from the YUI team helped me run some analysis on the types of network connections that were coming in to YUIBlog. In this article on the YUI blog, I've published the results of my analysis.

Please leave comments on the YUIblog.

Friday, March 19, 2010

WebDU

Wow, it's been another couple of weeks that I haven't updated this blog. Most of my time's gone in travelling. I was at ConFoo last week, and have a pending blog post about that, though it will be on the YDN blog. Next month I'll spend some time in India and then in May I head over to Bondi beach in Australia for WebDU.

I'll be speaking about web performance there and will probably do a pub event as well. Stay posted for more details. In any event, if Sydney is where you're at, then go register for WebDU. Geoff and crew put on a great show last year and it only looks to be getting better.

WebDU Speaker badge

...===...