Dear Data Geeks,

you may have asked yourself how forecasting in SAP Visual Intelligence works behind the scenes. I want to shade some light into this area and will try to protect you from some heavy maths reading. If you like the maths stuff, then you will be perfectly happy with the last paragraph in this Wikipedia article on “Triple Exponential Smoothing”:

http://en.wikipedia.org/wiki/Exponential_smoothing

This is the article our developers are referring to when we start asking too many questions ðŸ™‚

Let’s start simple and assume that your actual data shows the following behavior:

– a linear trend

– seasonal periods of a cycle length L

and you have at least two full cycles of actual data to base the forecast onto. Then the forecast in SAP Visual Intelligence forecasts the data for the next future period (no more than just one period to protect the user from granting too much trust into this and keep things simple :-).

The algorithmn is using past actual data to forecast the future. The word “exponential” in the name of the algorithmn means roughly speaking that the weight of older data in the computation is decreasing exponentially with time and impacts the forecast function less than newer data.

Example: We will use historic data of milk production in pounds per cow for each month from 1962 to 1975 and will VI to forecast the year 1976 (and note the in the current version 1.0.7 VI doesn’t forecast more into the future such as up to 2012 or so as I explained a minute ago).

This milk production is obviously

– growing year-over-year (and we will assume linear growth)

– periodic with cycle length L = 12 months (cows give more milk in summer)

And we are happy enough to have more than two full cycles of actual data (infact we have 14 full years, which makes the forecast very accurate)

If we create the forecast as shown in the screenshot above we will be asked a cycle length, which is obviously 12 months:

And we get this Forecast (green line) versus actual data (blue line) – I changed to line chart for better visibility (forecasting is currently only available for line and bar charts on time hierachies as x-Axis):

You can see that the forecast function fits the data as of the second cycle (1963) pretty well and predicts milk production for one future cycle here being 1976. The reason why our developers decided to forecast only one cycle is to stay within the range where the algorithmn has a chance to predict accuratly (and also to keep usability simple).

SAP Visual Intelligence displays the “forecast” data for date where there is actual data available in order to allow the user to make their own judgement about the forecast relevance or accuracy with respect to their own data. The chart below shows an example where the forecast doesn’t work well at all and this is UK crime statictical data (blue: actuals; green: forecast; yellow: linear regression).

What is the algortihmn behind doing?

On a high level the algorithm assumes 3 components in time series data:

- A constant component (s(t) in the article), e.g. not evolving over time
- A linear trend (slope b(t) in the article) over time
- A periodicity in the data of cycle length L

The algorithm itself needs only one complete cycle of actual observations given by x(1),…,X(L)

It can then provide a forecast F for times L + m, m= 1,2,3,…, L (e.g. the second cycle) by

F(L+m) = [s(L) + m x b(L)] x c(m)

where I am using a simplified notation for the special case t = L.

The article provides the general formula for F(t+m) provided with actual data up to any time t>L and for m = 1,2,3,…

My simplified notation shows nicely the structure:

F(m) = s + m x b is a linear function of m

And this function is superimposed by weights c(m) representing seasonal changes within a full cycle of Length L.

You need a full cycle of actual data to determine the weights c(1),…, c(L) –at least. The articles tells you how this is done and VI does it your you behind the scenes.

But the quality of the forecasts gets better and better the more actual data you have. Our developers stated that they coded it in a way that needs at least 2 cycles of actual data. The article describes the general case how to calculate the c(i), 1,.., L, where you have N complete cycles of actual data.

· Triple exponential smoothing requires a set of initial parameters to kick of the calculation. They are estimated by VI from the data automatically without user interaction to keep things sinple. A better estimation might give better prediction but if you want to do it yourself you need to purchase themore specalized product SAP Predictive Analysis. In fact there is a whole science behind how to choose the initial parameters in an optimal way. Please note that the algortithmn is not able to forecase multi-periodic data, e.g. data with weekly & monthly cycle. The algortithmn isn’t smart enough to cater for more than one cycle.

I should mention the special case where L = 1. Then c(1) = 1 and the formula also works for time series that has just has a trend, but no cycles, e.g. if your data is not periodic you choose in VI the cycle length 1. VI forecasts always only one full cycle. In the special case where L = 1 this is just a forecast of the next future observation at t+1. This is why you should use the linear regression instead for forecast for this case. Using linear regression SAP Visual Intelligence lets you choose how many months to extrapolate a linear trend.

BTW: Using linear regression for a process that grows period over period by a certain percentage as many growth processes do in nature and finance mathematically isn’t correct, because it isn’t linear, but exponential growth. Be careful what you use it for – it is not a crystal ball telling exact future always ðŸ™‚

Frank,

Thank you for taking a shot at this topic. Well done!

Very nice article... Exposes in a deep way how the forecast formula work. But the final thought its a really nice disclosure.