SAP Lumira Chart Extensions with a Predictive Flav...

JayThvV · ‎01-27-2015

There is a massive shift underway in Analytics, where the line between traditional analytics - where we primarily looked at the past and used line-, bar- and pie-charts to represent data - and predictive analytics - which taking a very broad view includes statistics as well as machine- and deep learning - is blurring. We no longer look just at "what happened", but increasingly to "why did it happen" and "what is going to happen". Add to that Big Data, which is hard to make sense of without statistical/predictive analysis, and it is clear that we'll see an increasing need in our visualization tools to be able to visualize the results from such analysis.

In this blog post, I will introduce three Lumira extensions that show how this can be done. The code for these with sample files is available from SAP's lumira-viz-library repository on GitHub.

Forecast with 80% and 90% confidence intervals

The first chart shows actuals with a forecast, and 80% and 90% confidence intervals. Confidence intervals are a standard statistical technique to visualize a certain degree of certainty in the forecast. The narrower the confidence intervals, the more reliable the forecast, and the wider the confidence intervals, the more we have to deal with substantial uncertainty. Suppose we are using such forecast to decide where to make an investment decision. If we only looked at the forecast result, that could mislead dramatically, if the confidence intervals are very wide.

You see clearly here that in this example (based on per capita GDP WDI data from 2014) the confidence interval for Australia is really narrow (no surprise, as the actuals are very smooth to begin with). But the situation is very different for Greece. While the forecast itself shows an upward trend, the confidence intervals are really wide, and we should certainly be prepared (simply based on past performance) that it doesn't recover at all. (Obviously, predictive algorithms can't predict political changes. Who knows what will happen to Greece? The point is that there is a great deal of uncertainty...)

Forecast with single confidence interval

Once I had developed the chart above, a colleague asked me if we could do the same with a single confidence interval, and most certainly we can. In this example, we have a seasonal time series, and you can see the chart handles that pretty well.

Holt-Winters Exponential Smoothing

Another common predictive analysis is exponential smoothing, where we give it a dataset and the algorithm smoothes it out to find a more significant signal to form a trend line of some kind. There are different variations of this, but to produce the data file I used Holt-Winters. The chart should work as well with other smoothing techniques, including moving averages. In this case, we're applying Holt-Winters exponential smoothing (in red) on a seasonally adjusted time series of tomatoes sold by weight (in blue) to see if there is a trend. (There isn't really one, it is largely stable. There is a lot of daily variation even after removing the seasonal effect, but the exponential smoothing shows there is much just a slight growth over the course of the ~2.5 years included in this set).

SVG Path Mini Language

These charts are not terribly complex at first sight, but there is a nasty little detail that will trip you up if you're not prepared for it. The main issue is that you have to deal with missing values, and between D3.js, JavaScript and Lumira that doesn't necessarily play nicely together. D3.js d3.svg.line() and d3.svg.area() assume that the entire dataset has values, and that is not the case here. Where we have actuals, we don't have forecast and confidence intervals, and for the exponential smoothing graph, the red line has values missing in the front and back, as a result of the way the algorithm works. If you just push this through d3.svg.line() any null values will be considered zeroes, which means you get weird "spikes" to the X-axis in the transition between no value and having values.

However, d3.svg.line() and d3.svg.area() are just helper functions around SVG paths. You can build SVG path through concatenating a string with key "letters" and coordinates. To start a line, you start with 'M', with each subsequent step indicated by an 'L'. These then take X and Y coordinates, and allow you to draw whatever line or area you want. So, "M0,0L10,0L10,10L10,0L0,0" would create a little 10x10 square outline. Then, rather than push all data points through d3.svg.line() or .area(), we simply put the "pen down" the first time we see a value, and continue on until the values stop again.

Code and sample files

You can find all the code and sample files in SAP's lumira-viz-library on GitHub.