My first steps in SAP PA – Time series analysis
Dear community,
I recently installed PA 2.2 for testing purposes of the “Time series analysis”.
This blog describes my steps to the final result plus I have some questions, since the final outcome seems very poor to me.
After having watched this video: http://scn.sap.com/docs/DOC-62239
I prepared a list of ~130 German companies with ~20 stock market key figures for the last 115 consecutive weeks.
The import file contained weekly data from CW 36/2013 to CW 46/2015, and my expected outcome was the share value by CW 5/2016.
Fortunately I had good help of a working student, who developed the “structure” file for me.
It seems to work but we are not sure if it is the best setup (further information is appreciated).
The blog is focusing on one example company “SAP”, for which a trend line was generated.
Question 1: Why do some results show trend lines and others don’t?
In my variables I used only “total value” key figures and avoided to mix them with percentage key figures.
I chose 12 future weeks to predict:
Warning message shown:
Obviously it can “predict” only 4 weeks?
Question 2: What does this warning mean? I found some warnings with 2 or 3, this with 4 as maximum horizon.
UPDATE: I forgot to include the following screenshot:
However, I continued and this is the result … quite …hm… strange … or ridiculous 😀
The table shows the whole “catastrophe”… almost only 40% variance between minimum and maximum.
…This and several other result seems to dice for finding the forecast.
Another highlight, Lufthansa: up & down and up& down:
Finally I have some more questions and would love to learn more about the tool and “Time series analysis”:
3) How can the structure file be optimized? Is there a how-to or SCN document/blog available?
4) Is there a way to analyze more than only one company at a time?
I would like to load the whole DAX (German main index) and use the same ~20 key figures of all companies for finding the results per company.
Since all shares have the “same attention” (like when in DAX or SDAX or MDAX) I would like to use additional “trends” within the market for analysis.
Is there somehow a “learning effect” I can initiate in the tool by using different data with same variables?
5) Is there a way to use no 4) “more companies at once” and getting only the trend lines per company as result … not the single predicted values?
6) How do I find out which of the 20 key figures I should keep or change for better results?
7) How does PA deal with mixed input of “total values” and “percentage key figures”?
8) How can I tell PA which relationships exist between key figures, e.g. those which are an outcome of a formula using the weekly share value.
9) I checked the logs and found statements like:
“The automatic variable selection process discarded all the extra-predictable variables when estimating the trend(<list-of-variables>)” or
“The trend model (Regression<list-of-variables>…has been discarded from the competition.” What does this mean?
Are all my 20 key figures in the file neglected and the forecast is based only on the historic share values? What could be the reason?
Thanks for reading… and any feedback is appreciated 🙂
Best regards,
Martin
Hi Martin - this is very good, as usual you provide great content
Why not ask your questions in the discussion area and reference this blog? I have seen SAP reply to questions there...
nice one 🙂
Hello Martin,
nice blog, you are asking yourself a lot of important questions that each analyst creating time series should ask. That's good 🙂
As per the answers, we can provide a reason for the behaviour you are seeing but, to make sure we get the full picture, could you please post the image of the Model Overview page showing the information below? And, if possible, share your data file?
Also, as Tammy is suggesting, it would be better if you link to this article from the DIscussion page and submit the questions there.
Best regards
PPaolo
You can also check this discussion
which tries to explain what's inside a Time Series analysis.By the way, from the look of it you are detecting a fluctuation signal based on lag which would explain the messages you receive but please post the information requested above so that we can better check.
Thanks for the feedback... will review this week.
@ Pierpaolo Vezzosi I included the missing screenshot above.
I don't know how to attach an example file here, so you can download from:
http://www.spielwiese.imlebe.net/SAP_source_data.csv
It contains the first 6 items indicated in the structure file screenshot above.
Should be sufficient to have these, since the log stated that the rest of the key figures had not been used, anyway.
Hello Martin,
I'll go more in detail in the answer in the discussion you opened here:
To close the thread in this blog, let me provide a quick overview answer.
First of all, predicting stock values is quite a complex thing, I guess that if it was an easy one I wouldn’t be here writing this answer ☺.
Second point, the ‘strange ‘results you see are not necessarily due to the product or the algorithm: the information used to build a model might not be good enough to provide a good forecast. Typically when choosing the historical data we need to use some business knowledge to identify information which could actually or potentially have an impact on the target; if there is no impact then no useful model can be created.
Before examining the results please do read the blog mentioned above (Re: What kinds of time series models are included?) where we explain that the result of a time series forecast is the combination of three components: a trend (long term direction of the signal), cycles (changes related to specific times, to specific periods or specific events) and fluctuations (changes related to previous values of the signal).
Now, looking at the results you have: the model has identified a linear trend and an autoregressive fluctuation based on the past 20 occurrences, no cycles were identified; all extra-predictive variables have been excluded and the model is based only on the date.
Let’s examine this result.
I hope this helps understanding the results for the posted test.
For detailed answers on the questions, please check the discussion page about this blog:
Thanks for the feedback.
The "20 days" in my data are eventually the values of 20 weeks (Stock Market values on Friday evening... I just converted it to the first day of each week).
I will soon post another blog, after having found out and tested more...