# HANA Predictive analysis –Single Exponential Smoothing

Introduction

Smoothing algorithms are basically used in time series data either to produce smoothed data for presenting the trend of the data or to forecast it for what if analysis. Time series data are sequential observations from the history of data with respect to a series of date, time or time stamps. Moving average analysis is also the same kind of time series analysis where the past observations are equally weighted. But, for certain analysis (price movement or share market stock movement), the recent past data has the most weight. Exponential smoothing assigns exponentially decreasing weights over time.

Pre requisites:

1. No missing/null data
2. Only numeric data can be smoothed

Algorithm

Let St be the smoothed value for the t’th time period and Xn (1, 2, 3…n) be the time series; mathematically,

S1 = x0 (you cannot get the smoothed value for the first entry in the time series).

St = αxt−1 + (1−a) St−1 where α is the smoothing factor (in %- Mathematically 0 < α < 1). When Alpha tends to 0, the weight given to the history value reduced and tends to 0.

Let us consider an example of price trend of a particular material over a series of date range.

 DAY PRICE 2-JUNE-2014 100 3-JUNE-2014 95 4-JUNE-2014 110 5-JUNE-2014 110 6-JUNE-2014 98 7-JUNE-2014 Holiday 8-JUNE-2014 Holiday 9-JUNE-2014 105 10-JUNE-2014 118

There is no data available for 7th June and 8th June. But, as per the algorithm null or missing series are not allowed. In this case, if the previous data is not available corresponding smoothed value will be taken as Xt for those entries.

Now, let us convert the input table into time series and manually apply the smoothing algorithm, with smoothing factor as 50 %( 0.5). Consider the first DAY as the base date.

 DAY Time PRICE Smoothed Value ( St) 2-JUNE-2014 0 100 3-JUNE-2014 1 95 100 4-JUNE-2014 2 110 97.5 5-JUNE-2014 3 110 103.75 6-JUNE-2014 4 98 106.875 7-JUNE-2014 5 Holiday 102.4375 8-JUNE-2014 6 Holiday 102.4375 9-JUNE-2014 7 105 102.4375 10-JUNE-2014 8 118 103.71875

Calculation

1. S(0) will be null
2. S(1) will be , 0.5 * 100(which is Xt-1) + 0.5 * 100 = 100
3. S(2) will be , 0.5 * 95 + 0.5 * 100 = 97.5
4. S(3) will be , 0.5 * 110 + 0.5 * 97.5 = 103.75
5. S(4) will be ,05.*110 + 0.5 * 103.75 = 106.875
6. S(5) will be , 0.5 * 98 + 0.5 * 106.875 = 102.4375
7. S(6) will be , 0.5 * 102.4375(previous time series values in not available, hence the smoothed value is considered ) + 0.5 * 102.4375 = 102.4375
8. The same process continues for the upcoming entries and can be forecast up to n number of time series entries. Graph simulated with smoothed data in Microsoft Excel

PAL Implementation

(Source code from SAP HANA PAL Document is re used to generate the below code snippet – Page 193 ).

``````
CREATE SCHEMA PAL_TRY;
SET SCHEMA PAL_TRY;
--dropping existing procedures if any
CALL SYSTEM.AFL_WRAPPER_ERASER('SINGLESMOOTH_TEST_PROC');
--Creating procedures
CALL SYSTEM.AFL_WRAPPER_GENERATOR('SINGLESMOOTH_TEST_PROC','AFLPAL','SINGLESMOOTH',PAL_SINGLESMOOTH_PDATA_TBL);
CREATE LOCAL TEMPORARY COLUMN TABLE #PAL_CONTROL_TBL ("NAME" VARCHAR(100),
"INTARGS" INT, "DOUBLEARGS" DOUBLE, "STRINGARGS" VARCHAR(100));
--RAW_DATA_COL : column where the data is available
INSERT INTO #PAL_CONTROL_TBL VALUES ('RAW_DATA_COL',1,NULL,NULL);
--Alpha value
INSERT INTO #PAL_CONTROL_TBL VALUES ('ALPHA', NULL,0.5,NULL);
--Forecast_num : Forecast next 100 values(Includes future values )
INSERT INTO #PAL_CONTROL_TBL VALUES ('FORECAST_NUM',100, NULL,NULL);
--STARTTIME : ID starts from 0
INSERT INTO #PAL_CONTROL_TBL VALUES ('STARTTIME',0, NULL,NULL);
CREATE COLUMN TABLE PAL_SINGLESMOOTH_DATA_TBL LIKE PAL_SINGLESMOOTH_DATA_T ;
INSERT INTO PAL_SINGLESMOOTH_DATA_TBL VALUES (0,100.0);
INSERT INTO PAL_SINGLESMOOTH_DATA_TBL VALUES (1,95.0);
INSERT INTO PAL_SINGLESMOOTH_DATA_TBL VALUES (2,110.0);
INSERT INTO PAL_SINGLESMOOTH_DATA_TBL VALUES (3,110.5);
INSERT INTO PAL_SINGLESMOOTH_DATA_TBL VALUES (4,98.0);
INSERT INTO PAL_SINGLESMOOTH_DATA_TBL VALUES (7,105.0);
INSERT INTO PAL_SINGLESMOOTH_DATA_TBL VALUES (8,118.0);
CREATE COLUMN TABLE PAL_SINGLESMOOTH_RESULT_TBL LIKE PAL_SINGLESMOOTH_RESULT_T;
--Executing the procedure
CALL _SYS_AFL.SINGLESMOOTH_TEST_PROC(PAL_SINGLESMOOTH_DATA_TBL,"#PAL_CONTROL_TBL", PAL_SINGLESMOOTH_RESULT_TBL) WITH OVERVIEW;
SELECT * FROM PAL_SINGLESMOOTH_RESULT_TBL;``````

The output table PAL_SINGLESMOOTH_RESULT_TBL will contain the smoothed data which can be used for the analysis or presentation purposes.

Regards

Sreehari V Pillai

/ You must be Logged on to comment or reply to a post.
• Thank you, Sreehari!

smoothing algorithms require (TIME_ID, VALUE)   table structure for the source data. Do you have an idea how to generate this integer TIME_ID which starts from 0, if your source table only has DATE and VALUE, like in your example?

I talk about the case when we want to apply PAL function to an existing table/view.

Thank you one more time,

Kostia Kharchenko

• Hello Kostia, I am not sure if this will be helpful in your scenario – however, what I have done in some cases in the past is convert the DATE to a integer value. I do this by creating a View on top of my existing table, and then subsequently using the view as the input into the PAL function.

Here is an example,

SELECT TO_INT(TO_CHAR(UTCTOLOCAL(DT_ADDED, ‘” + strTimeZone + “‘), ‘HH24’)) ID

• Hi Paul, nice to see you here!

Thank you, I might go on with your solution.

UPD: unfortunately, it doesn’t fit in my scenario, because i convert dates YYYYMMDD and they are not continuous, ex: 20141231 + 1  will be 20141232

still thank you for the advice!

• Sreehari V Pillai Post author

Hi Kostial,

For your scenario, you can consider the least value of the date column and make it as the base date. For eg, let Jan 1 ,2014 be your base date so that integer value corresponding to Jan 1,2014 will be 0. For other dates comes after this, find the number of days between the base date and the current value. For eg, feb 1,2014. Here, your integer value will be 31. Like that you can follow.

@Paul : Thanks for your view on this 🙂

Sreehari.

• Hi,

is there a possibility to find out with SAP Predictive Analysis which smoothing method (1st, 2nd or 3rd) is suitable for the existing time series data set?

How can we optimize the method in order to alpha?

Whats about the figure goodness of fit – which statement does this figure deliver and how it has been calculated?

• Sreehari V Pillai Post author

Hi Can,

That’s a nice question by the way. Generally you are asking two questions. one, how to choose which exponential smoothing method is to be chosen for a time series . Second, how to choose values for alpha for a better smoothing.

How to choose right values for alpha : As I said, for lesser values of Alpha , the weightage you are giving to your historical data is reduced. For example , market price forecast. you are not bothered about what was the value of a commodity past one year, but just past months or just yesterday. So, here the alpha value would tend to zero.

How to choose the model : You have to apply forecasting on your time series and find out the deviation from  original value. That is , suppose you have a time series with time values 0 to 100. Consider 0 to 50 values and forecast till 100. Find the deviation from original value for all the three models. Which model is giving the closer forecast to your time series, you can choose it . May be I will be writing a blog on Choosing the right model for PAL.

Sreehari V Pillai

• Hi Sreehari,

first of all great thanks for answer. So if i have understood it correctly deciding the suitable smoothing method is based on “learning by trying”. Are there no statistical figures like autocorrelationcoefficient which gives out the attributes of the time series data like trends, or saisonable characteristics?

My next questions are: How does the figure “Goodness of fit” is been calculated? Thanks in advance

• Sreehari V Pillai Post author

If I understood the smoothing algorithms correctly, these algorithms basically simulate the trend and seasonal characteristics . Have a look into the linear regression – Linear regression – Wikipedia, the free encyclopedia (I m sure it will take months to understand these – most times I failed 🙂 ).

For the goodness of fit calculation, there are standard algorithms available ( like K-S algorithm ) . I don’t think HANA PAL provides those algorithms . Let me check for the alternative features if any.

Sreehari