Dear community,

I recently installed PA 2.2 for testing purposes of the “Time series analysis”.

This blog describes my steps to the final result plus I have some questions, since the final outcome seems very poor to me.

After having watched this video: http://scn.sap.com/docs/DOC-62239

I prepared a list of ~130 German companies with ~20 stock market key figures for the last 115 consecutive weeks.

The import file contained weekly data from CW 36/2013 to CW 46/2015, and my expected outcome was the share value by CW 5/2016.

Fortunately I had good help of a working student, who developed the “structure” file for me.

It seems to work but we are not sure if it is the best setup (further information is appreciated).

Structure.png

The blog is focusing on one example company “SAP”, for which a trend line was generated.

Question 1: Why do some results show trend lines and others don’t?

In my variables I used only “total value” key figures and avoided to mix them with percentage key figures.

Variables.png

I chose 12 future weeks to predict:

TimeSeries.png

Warning message shown:

Warning.png

Obviously it can “predict” only 4 weeks?

Question 2: What does this warning mean? I found some warnings with 2 or 3, this with 4 as maximum horizon.

UPDATE: I forgot to include the following screenshot:

Protocol.png

However, I continued and this is the result … quite …hm…  strange … or ridiculous 😀

SAP Forecast.png

The table shows the whole “catastrophe”… almost only 40% variance between minimum and maximum.

FC_vs_Signal_SAP.png

…This and several other result seems to dice for finding the forecast.

Another highlight, Lufthansa: up & down and up& down:

LH_Wuerfeln.png FC_vs_Signal_LH.png

Finally I have some more questions and would love to learn more about the tool and “Time series analysis”:

3) How can the structure file be optimized? Is there a how-to or SCN document/blog available?

4) Is there a way to analyze more than only one company at a time?

I would like to load the whole DAX (German main index) and use the same ~20 key figures of all companies for finding the results per company.

Since all shares have the “same attention” (like when in DAX or SDAX or MDAX) I would like to use additional “trends” within the market for analysis.

Is there somehow a “learning effect” I can initiate in the tool by using different data with same variables?

5) Is there a way to use no 4) “more companies at once” and getting only the trend lines per company as result … not the single predicted values?

6) How do I find out which of the 20 key figures I should keep or change for better results?

7) How does PA deal with mixed input of “total values” and “percentage key figures”?

8) How can I tell PA which relationships exist between key figures, e.g. those which are an outcome of a formula using the weekly share value.

9) I checked the logs and found statements like:

“The automatic variable selection process discarded all the extra-predictable variables when estimating the trend(<list-of-variables>)” or

“The trend model (Regression<list-of-variables>…has been discarded from the competition.” What does this mean?

Are all my 20 key figures in the file neglected and the forecast is based only on the historic share values? What could be the reason?

Thanks for reading… and any feedback is appreciated 🙂

Best regards,

Martin

To report this post you need to login first.

7 Comments

You must be Logged on to comment or reply to a post.

  1. Tammy Powlas

    Hi Martin – this is very good, as usual you provide great content

    Why not ask your questions in the discussion area and reference this blog?  I have seen SAP reply to questions there…

    (0) 
  2. Pierpaolo VEZZOSI

    Hello Martin,

    nice blog, you are asking yourself a lot of important questions that each analyst creating time series should ask. That’s good 🙂

    As per the answers, we can provide a reason for the behaviour you are seeing but, to make sure we get the full picture, could you please post the image of the Model Overview page showing the information below? And, if possible, share your data file?

    24-11-2015 08-23-50.jpg

    Also, as Tammy is suggesting, it would be better if you link to this article from the DIscussion page and submit the questions there.

    Best regards
    PPaolo

    (0) 
  3. Martin Kreitlein Post author

    Thanks for the feedback… will review this week.

    @ Pierpaolo Vezzosi I included the missing screenshot above.

    I don’t know how to attach an example file here, so you can download from:

    http://www.spielwiese.imlebe.net/SAP_source_data.csv

    It contains the first 6 items indicated in the structure file screenshot above.

    Should be sufficient to have these, since the log stated that the rest of the key figures had not been used, anyway.

    (0) 
    1. Pierpaolo VEZZOSI

      Hello Martin,

      I’ll go more in detail in the answer in the discussion you opened here:Re: Several questions on PA – Time series analysis

      To close the thread in this blog, let me provide a quick overview answer.

      First of all, predicting stock values is quite a complex thing, I guess that if it was an easy one I wouldn’t be here writing this answer .

      Second point, the ‘strange ‘results you see are not necessarily due to the product or the algorithm: the information used to build a model might not be good enough to provide a good forecast. Typically when choosing the historical data we need to use some business knowledge to identify information which could actually or potentially have an impact on the target; if there is no impact then no useful model can be created.

      Before examining the results please do read the blog mentioned above (Re: What kinds of time series models are included?) where we explain that the result of a time series forecast is the combination of three components:  a trend (long term direction of the signal), cycles (changes related to specific times, to specific periods or specific events) and fluctuations (changes related to previous values of the signal).

      Now, looking at the results you have: the model has identified a linear trend and an autoregressive fluctuation based on the past 20 occurrences, no cycles were identified; all extra-predictive variables have been excluded and the model is based only on the date.

      Let’s examine this result.

      1. The trend doesn’t appear to be influenced by the extra variables you added, if you look at the red line showing it, it looks like it is a good choice for the actual data.
      2. The fluctuations are an autoregression based on the past 20 occurrences: the algorithm tries to see if the value of the signal (the stock value), after removing the trend, can be predicted based on the values of the past 20 days. Behind the scenes the tool has tested many other lengths (from 2 days and up to 450 by default and if you have enough data in your dataset).  Apparently those 20 days are the ones which minimize the error. This has a business meaning: when you think of a stock value you expect it to have small variation day after day (e.g.typically we see stocks to vary a couple of percentage points per day at maximum, if they do more than that it gets into the stok market news as something not typical). So a 20 days period could be good for a stock value which is quite stable in time, 20 days is more or less a working month, and doesn’t have often spikes.
      3. The extra-predictable variables have been excluded, there might be a few reasons for this: first reason is that they don’t actually influence the model. A second reason is that you set them as ‘nominal’ and ‘nominal’ variables cannot be taken into account for the cyclic signal.You can try and set them to ‘ordinal’ (if the data is actually ‘ordinal’ i.e. discrete and ordered) and see if something changes.

      I hope this helps understanding the results for the posted test.

      For detailed answers on the questions, please check the discussion page about this blog: Re: Several questions on PA – Time series analysis

      (0) 
      1. Martin Kreitlein Post author

        Thanks for the feedback.

        The “20 days” in my data are eventually the values of 20 weeks (Stock Market values on Friday evening… I just converted it to the first day of each week).

        I will soon post another blog, after having found out and tested more…

        (0) 

Leave a Reply