From an engineering perspective, reducing the runtime of an operation or process from 900 to 300 sec might be an impressive and significant achievement. However, to an end user it is likely to remain in the category “get a cup of coffee in the meantime”. Similarly, many users might not even note a difference between a query that runs 1.5 sec and one that runs 0.5 sec, even though it’s factor 3. What does that imply?
A few years ago, I attended a presentation* by an Ebay colleague who explained how they had analysed IE’s mechanism of rendering an HTML page. The purpose was to design Ebay’s pages in such a way that the header ribbon would show up very quickly – in order to keep the user’s attention – and to push less important pieces of the page to – e.g. parts at the lower end that would consequently not be part of the initially visible area of the window. Probably you agree that this sounds as the natural thing to do. However, this is somewhat different from what is discussed normally around performance as it puts the end users perception (of performance) beyond anything else. For example, the build-up of the completepage might even take longer as a result, even though the end user perceives it as faster as for him important stuff shows up quickly.
How does that translate to analytics? Well, one example that I’ve seen a lot is that end users accept even slow performance if they recognise the processing effort behind, for example, a query, a load or transformation job. For instance, they relate they amount of data in the query result intuitively to that effort. That works is many cases. However, a counter-example is a top-N query, e.g. calculating the top-10 customers with respect to revenue. The result is obviously a list of 10 customers with some additional information like the respective revenue. While that result list is very short the amount of processing behind can be huge. Simply imagine a utility, e-commerce, mobile phone company that has millions of customers. Basically, you need to check each and everyone of those to see whether he/she qualifies for the top-10. So while the result is small, the effort is huge. Probably, it would be more acceptable if there was – for instance – a progress bar showing how the engine walks through the millions of customers.
Now, what can be concluded from all of that? I think that there is a tremendous and frequently still neglected opportunity to do a better job regarding performance in addition to purely benchmarking this or that SQL operator, query, stored procedure etc. As initially mentioned: bringing down a job from 900 to 300 sec can be a tremendous engineering achievement. Still, it is possibly not perceived as such an outrageous improvement by an end user. And the latter is the one who must be our prime focus.
* The presentation was given by James Barrese at HTPS 2005. Unfortunately the link provided on the agenda page does not work (anymore).