In a previous post, I wrote about designing efficient UIs for trained users. Well, this is only half the story, or even less: What about the time when users are still on the learning curve? What about self-service applications where users need to be efficient right away? Mind you, it’s the slow users who drive ROI. If you want to improve application efficiency, you need to track down where exactly the seconds were lost. Survival analysis can help a lot to do this – to spot critical applications, to generate ideas about what might have gone wrong, and to motivate and evaluate activities to fix the problem.
I came across survival analysis some years ago, when I needed statistical techniques for analyzing time data from usability tests. Having grown up with means, standard deviations, and analysis of variance, it was an eye-opening experience I’d like to share. What an amazing toolkit!
Survival analysis has been developed by insurance mathematicians who wanted to explore factors influencing the life expectancy of their customers. Exchange “life expectancy” with “time on task”, and “customers” with “users”, and you see why I’m so excited about it. In fact, the math is the same, although somewhat scary indeed to the non-specialist. Let’s see whether I can give you a human-readable introduction.
The Survival Function Tells Who’s Still Working
Consider Figure 1, which shows task completion times for 18 users from a usability study. The bars indicate the time users each have been working on the task. I have sorted the bars by time, so for any given time, you can see how many users have finished the task. I also added a logarithmic trendline, which fits the data rather well – not by chance, as we shall see.
Survival analysts are interested in the likelihood of an individual (user) to “survive” (keep working on the task) at a given point in time. It can be calculated as the number of individuals still “alive”, divided by the number of individuals in the study, and is called the survival function S(t). In our case of task completion times, its complement F = (1-S) denotes the likelihood to solve the task. With only 18 subjects, our empirically observed survival function is rather jittery. What we eventually want is a smooth function that optimally estimates the survival function in the entire population of users. Once we have that, we can predict how many users will be able to solve a task at a given time – and much more, as we shall see.
In order to get there, we need to take two steps. First, we need to acknowledge that the user rank index is not an exact estimate of that person’s percentage rank in the entire population – it’s a number for counting, but not the midpoint of the corresponding percentage interval. Luckily, there is a simple formula to fix this, called the Median Rank Estimate:
S = 1 − (i−0.3)/(N+0.4)
where i is the rank index, and N the number of study participants.
The second step is a bit more tricky. We need to identify and approximate a function that optimally models the time data corresponding to our S (or F) values. This depends on the statistical distribution type the data follow, which in turn depends on the underlying process that generated those time data in the first place. There are a fair number of candidate distributions, but one sticks out: the Exponential.
Exponential distributions are typical for purely random processes, such as radioactive decay. What’s characteristic for those processes is that the proportion of items affected is constant over time. You may have heard the term “half-life”: if half the atoms of a substance decayed at time t, half of that half will have decayed after 2t. This is why the log curve in Figure 1 fits so well!
The Probability Plot Shows It All
This greatly simplifies our function approximation problem: if we plot S on a logarithmic scale, we can simply fit a straight line! So let’s plot the times in Fig. 1 again, but this time, against the natural logarithm of S (which becomes negative, because S assumes values between 0 and 1). Adding a trendline, together with the regression equation and R², takes two clicks in Excel, so let’s do this as well. To complete the picture, I added two lines for S=0,5 (half the users finished the task; orange) and S=0,25 (F=75% finished; grey).
The result is called a probability plot. Let’s see what we see. First, the data points actually do line up straight. The linear trendline (blue) equation explains 96,21% of the observed variance; this is what R² stands for – an indication that the exponential distribution model is fitting rather well. Second, consider the intersection points of the orange and grey lines with the trendline. Those points’ ordinates indicate the times when 50% resp. 75% of users solve the task. When we resolve the regression equation for x (time) for S values 1, 0.5, and 0.25, we obtain 51.4s, 127.5s, and 203.7s, respectively. Let’s call those times t0, t50, and t75, according to the percentage of users who solved the task at that time (t50 is also called the median time).
Now comes the fun part. If you calculate the time differences t50– t0, and t75– t50, you get 76.2s either time – that’s exactly the “half-life” of our exponential distribution. The next half of the users will take those 76.2s again to get finished! Try for yourself to calculate the times when 12.5% and 6.25% of users are still busy. In fact, you can calculate for any time the percentage of users you can expect to finish the task!
Invest in Faster Hardware or Better Usability?
Let’s take a step back and have even more fun. What we have here is an exponential distribution which is, as statisticians say, translated by a constant time t0. This means that we can think of this process as a completely random process that started after a constant time t0 has passed. Let’s consider these random and constant components separately.
In a typical usability study, you have a fairly constant system performance – that is, constant response time when following the straight solution path. Also, the times users need to merely click through the UI on the straight solution path don’t vary very much. Those two parts determine the constant time t0. The random process component is where users leave the straight path – click around, think about solutions, make mistakes, get lost – in other words, when they experience essentially random problems resulting either from task difficulties or usability issues. The amount to which this happens is expressed by the slope of the regression equation, which is inversely proportional to the half-life time. This number contains all the information needed to model the random part of the process!
Consider this: with the time constant t0 and the half-life of the random process, we can analyze technical performance and UI efficiency separately. Cool, isn’t it?
Want more fun? Suppose you conducted your study in order to decide whether to invest in faster hardware or a UX project. If you invest in hardware, you will improve the constant t0 but not the random “fiddling with usability issues” part of the distribution, its half-life. If you invest in UX, it’s the other way around. Let’s have a look at our plot again (figure 3) and consider what this means.
If you cut t0 by half (orange line), which would be quite impressive, you would gain 25.7s on all times – considering t50, that would be 101.9s instead of 127.5s. Very well.
If you cut the half-life time of the random process part by half (green line), you’ll end up with a t50 of 89.5s, and t75 of 127.5s instead of 203.7s. Well, that’s what I’d call impressive.
What’s Random and What Isn’t?
The fun is not over yet. So far, we’ve looked only at an exponential distribution – those distributions are generated by purely random processes. Other processes generate other distributions – analysts of technical system reliability have turned distribution gazing into an art form. We don’t have to go into this level of detail: any deviation from the exponential model indicates something that may have influenced an otherwise random process. To spot those influences, all we have to do is to look whether or not data points deviate systematically from the straight line in our plot. Three classes of deviations are typical (figure 4):
- Data points to the left of the t0 intersection point. In the difference is more than the “normal” fluctuation around the trendline, it can indicate that someone has cheated the test!
- Data points to the left of the lower part of the plot. This indicates that slower users were not quite as slow as we’d expected – there may be learning effects, or users gone astray were led back on the solution path by good error handling.
- Data points to the left of the lower right part of the plot. This indicates that slower users have been slowed down even further by additional factors – fatigue, confusion, anger… clearly a signal to look closer into what happened to these poor folks. Note that as a rule, slow users are not stupid but slowed down by something stupid, such as, for instance, bad usability.
Fit for Survival…Analysis?
Let’s wrap up what we have so far:
- The survival function describes the percentage of users who are still working at a given time,
- We can estimate a comprehensive, quantitative model of task completion rates over time,
- We can separately analyze technical vs. usability-related factors that affect task completion times,
- Using probability plots assuming an exponential distribution of times, we can easily spot outliers as well as systematic deviations from a random process.
This approach works basically on any kind of time data – usability test data, tracking data from business processes, dwell time on web pages, etc. It can be easily automated in a spreadsheet – list your times, sort and rank, calculate the corresponding S, natural log of S, and scatterplot – done.
Of course this is just a first glance at the wonders to be found in the survival analysis surprise bag, which I hope was interesting for you. If you want to learn more, I published an article in the Journal of Usability Studies that goes into more detail of using probability plots on usability test data. In particular, the article discusses how to analyze and interpret other common time distributions, how to make use of time data from test participants who could not solve a task (if everything worked perfectly, we wouldn’t have to test), and how to compare time data from different test trials (after all, you want to know whether things improved after your redesign). The article also provides a list of further resources, and a spreadsheet for analyzing usability test data.