Custom R Component: Bulk Forecasting by Month 2.0
Update July 2016: This article relates only to the Expert modus in SAP Predictive Analytics.
Please see these articles on forecasting in the Automated mode, which allows for instance for additional predictor variables
This component adds the capability to SAP Predictive Analysis to automatically forecast many different monthy time series in one go.
Background
Imagine your company is selling 500 products and you need to forecast the sales quantities for each product for the next 6 months. It would take too much effort to look at each product manually and to forecast its sales quantity individually. This component automates such a forecasting process. It looks at each product individually and based on the product’s history it finds the best forecasting model and configuration for this very product. Once the individual forecast is done for the product, the component will move to the next product to find the model and configuration that describes this product best, and so on. This component aims to automate such a monthly forecasting as much as possible.
Major functonality of this component are
- The user can choose by which column to forecast (country, product, etc.).
- The input data does not have to be aggregated, the component aggregates and sorts the data automatically.
- Missing data (ie no sales of a product in one month) is added to the dataset with measure value of zero. Therefore the component works on transactional data.
- The user can decide how many months to forecast.
- The user can decide which model type to use (AUTO.ARIMA, ETS or an average of multiple models).
- The component can also do a 12-months hold out evaluation to find the best model type, which is then used to forecast.
- The forecast models can be restricted to non-negative forecasts (to avoid negative revenue forecats for instance).
Forecasting sales quantities is only an example. This component works with any kind of monthly data that needs to be forecasted
- Revenue
- Customer Numbers
- Number of Product Returns
- Traffic statistics
- Average monthly weather data
- and many, many more….
This component is heavily building upon the forecast package. Please see the documentation of this package for further information on the forecasting concepts.
If you are new to creating Custom R Components in SAP Predictive Analysis, you can have a look at this overview to get you started. Please note that this code is not supported by SAP. When using this function please carry out your own testing.
This component calculates and compares many different models, hence execution time might be long. Please start with a small dataset that holds only a few time series. I have attached some small sample data on road accidents in Switzerland. You can forecast by either accident location or accident severity.
Prerequisites
The historic dataset has numerical columns for year and month.
The R-library forecast is installed.
Usage
These parameters can be set by the user..
Parameter | Description |
---|---|
Measure to Forecast | Name of column that holds the measure that is to be forecasted. |
Time Series by | Name of column that identifies individual time series that are to be forecasted (ie forecast by product, country, …). |
Year | Name of column that holds the year number. |
Month | Name of column that holds the month number. |
Forecasting Concept | Concept that is used for the forecast. Possible values: ‘AUTO.ARIMA’, ‘ETS’, ‘Composite’, ’12 Months Hold-Out’ |
Months to Forecast | Number of months that are to be forecasted. |
Confidence Level | Confidence level for upper and lower prediction intervals, ie 0.9. |
Chart Type | Chart type that will display the time series. Possible values: ‘Forecast’, ‘Decomposition’ |
Positive Values Only | Restrict forecast to models that produce only non-negative values. Possible values: ‘True’, ‘False’ |
Output Columns
Column | Description |
---|---|
Year | The year of the data point. |
Month | The month of the data point. |
YearMonthString | The year and month concatenated into a single string. |
ForecastBy | Name of the forecasted time series, ie Switzerland, France, USA if the ‘Time Series by” parameter was set to Country. |
Measure | The measure that is forecasted. |
Type | Indicates whether the current data point is part of the historic data or whether it was forecasted. Possible values: ‘Actuals’, ‘Forecast’ |
PILower | Lower limit of prediction interval. |
PIUpper | Upper limit of prediction interval. |
Model | Describes the forecast model and configuration. |
Example
As an example, let’s forecast the passenger numbers of a transportation company. These are the parameter settings for a forecast by geographic region:
Run the component and you see the results as custom chart. The component can forecast as many time series as you like. However, only a maximum of three will be displayed in the chart. The result of the component will include all time series of course. Notice how each time series is forecasted very differently. The header in the chart shows the name of the time series, ie ‘Middle East’. The sub-header shows the model type and its configuration, ie ‘ETS(A,N,A), followed by a counter.
The same forecast with chart type ‘Decomposition’ shows this result to understand the seasonality, trend and remainder of the original time series. This chart needs more space so that it only displayed for the last time series. If you want to see the chart for a certain time series you can add a filter component in your analysis to reduce the dataset to the relevant time series.
Use a Write Component to save the results for further processing.
You can also try out this component to forecast the number of road accidents hapenning in Switzerland. Just use the file SwissRoadAccidents.csv.
How to Implement
The component can be downloaded as .spar file from GitHub.Then deploy it as described here. You just need to import it through the option “Import/Model Component”, which you will find by clicking on the plus-sign at the bottom of the list of the available algorithms.
Hi Andreas,
Thanks for sharing the component. I tried using it as described in the doc, I could see the component in the component list after opening PA. I tried to use it one of the analysis but the PA just hanged. After closing it using Task Manager I tried to reopen it but it just hangs and does not open the analysis. It is also not opening other analysis where I used Custom R components created by me. It only opens the models where I have use the Standard PA algorithms.
Something has gone wrong after getting that component in the directory mentioned?
Let me know if anything can be done?
Hello Bimal, Most likely the component was still calculating. It calculates and compares many different models. If your dataset has many time series, it might appear as if SAP PA was hanging, whilst it is actually still calculating.
I have just attached some small sample data on Swiss road accidents. Please send me a private message in case this is not working for you and I will investigate.
Greetings
Andreas
Hi Andreas
I've installed your package but the forecasting one will not run as it requires version 3.02 of R. I've tried to get an earlier version of the forecast that will run with 3.01 of R, but no luck. Can you advice please?
Cheers
Jon
Hello Jon,
The code was created with the forecast package version 4.8. You can download this older windows package here
http://cran.r-project.org/bin/windows/contrib/2.15/forecast_4.8.zip
I have go this package working jsut fine on one computer with SAP PA 1.15 and R 3.0.1. On antother machine however, R 3.0.1. refuses to work with this version. So far I have to admitt I am not sure why the second machine isn't happy.
Andreas
Hi Andreas
I tried this version but experienced the same issue as you did with the second machine...so I'm kind of stuck. Maybe PA should upgrade to R 3.02.
Thanks for your help
Hello Jon,
I have got a solution now. These steps made it work on three different machines. Just make sure first of all to uninstall the forecast package that you have at the moment.
Step 1: Download the forecast package 4.9 as listed on the website of the package's developer. See bullet point 5 on
http://robjhyndman.com/hyndsight/old-r-packages/
Step 2: Copy the downloaded forecast_4.9.zip into C:\
Step 3: Install the package with this command in R:
install.packages("C:\\forecast_4.9.zip")
Step 4: Install the forecast package's dependencies. If you are not using a proxy, you might not need the first command
setInternet2(use = TRUE)
install.packages("fracdiff")
install.packages("tseries")
install.packages("RcppArmadillo")
Step 5: Test if the forecast package is installed correctly. This statement should now give a Warning, which is ok.
forecast(library)
You can then test the Custom R Component in SAP PA.
Let me know please if this makes it work for you as well.
Cheers
Andreas
Hi Andreas,
I tried the steps mentioned by you, but I get the following error now.
"Error: package ‘forecast’ was built before R 3.0.0: please re-install it"
I had previously got the forecast package working by manually installing the tar.gz file.
But now that is also not working.
Any idea how to resolve this?
Thanks,
Bimal
Hi Andreas,
Thanks for your help. The forecast package installed with just a warning as you pointed out. However I now have the problem where using any R algorithm crashes PA without warning . I've checked the logs but nothing really stands out as a possible cause. I have uninstalled R and then PA, cleaned the registry, deleted all content that pertains to R and PA that I know of. Reinstalled PA and the R and tried again without success.
It's a shame as I'd to demonstrate this to a customer using there data and using your bulk forecasting algorithm.
Any suggestions?
Thanks
Jon
Hello Jon,
I have never seen that kind of behaviour you describe. Maybe you have to execute a remove package command. That made it work for Bimal, but then he had a different issue.
remove.packages(“forecast”)
Can you use a test machine to
- install a fresh SAP PA
- install / configure R 3.0.1 from within SAP PA
- Then add the forecast package as described above
install.packages("C:\\forecast_4.9.zip")
setInternet2(use = TRUE)
install.packages("fracdiff")
install.packages("tseries")
install.packages("RcppArmadillo")
forecast(library)
Hi Andreas,
Your solution worked for me. I uninstalled the older ones and removed all old files from the lib folder.
I can use the package now.
Hi Andreas
I thought I'd replied but it doesn't appear here. Yes it worked fine on a VM using your Swiss accident data, only graphs though no data grid. Is there something that needs to be enabled for the production of the data grid?
Also I'm not making any progress with R crashing PA, there has to be something in the registry that ensures the corrupt code continues to be used. Don't really want to reload the OS to get around this.
Cheers
Jon
Hi Andreas
Have got the R algorithms working now. I had to delete everything in the registry pertaining to R, all folders on my local drive with R content and re-booted my PC. Got PA to install R and the configured it and then ran R algorithms and your custom solution successfully. Still don't get a data grid though.
Thanks
Jon
Hello Jon,
I am glad you got it working now!
The data grid is on the Results tab.Here on the right hand side click the "Bulk Forecasting" component. Here is a screenshot of what it should look like.
Let me know in case this shows up differently for you?
Greetings
Andreas
Hi Andreas,
Thanks verymuch for sharing the component. I did all the steps and started running the algorithm but ended with error
Bulk Forecasting by month 2.0: An error occured whule executing commands in the R environment.
Details:
Cause : error from RL " error in estmodel(y, errortype[i], trendtype[j], seasontype[k], damped[i], :
Function cannot be evaluated at initial parameters"
Do you by any chance know more about this error?
Thanks in advance for your help.
Shrirama
By the By, I was able to run the Swiss road accidents analysis successfully. What you think could be wrong in my data? I have three years sales per material by year by month.
Hi Shrirama,
Please check that the columns used in the model's configuration have the same data type as the columns from the Swiss data, ie: measure, time series by, year and month.
You can see this for instance on the prepare tab. The measure column should NOT have "ABC" next to it. It should say "123" to indicate the column is seen as numeric.
Greetings
Andreas
Hi Andreas
Thanks for this amazing component, it give us so many options of algorithms to choose from. Its working fine in SAP PA 2.3. Just wanted to check with you one thing like if I want to check the model accuracy and error margin after applying it to my data, How will I do that? Because the Model Statistics Component doesn't work with this? Any idea on this ? In the below output I can't really make out, whats the error margin...
Regards
Ranajay
Hi Ranajay,
Thank you for the feedback. Nice to hear you like the component!
It does work with PA 2.3. To compare the forecast accuracy would require something separate, maybe another Custom R element.
Saying this, the "Bulk Forecasting" component was created before SAP acquired KXEN. This is now Automated Analytics within SAP PA and comes with its own dedicated time series forecasting. It's extremely powerful, for instance because it allows for additional predictor variables. The help file comes with a nice example on forecasting a single cash flow time series
http://help.sap.com/businessobject/product_guides/pa23/en/pa23_ts_user_en.pdf (just search for CashFlows.txt)
I hope to publish a small tutorial soon that explains how to use that Automated engine to forecast multiple time series at once.
Many Greetings
Andreas
Hi Andreas
I tried the Automated Analytics Time series too. Actually the best part of your component which I like is that hierarchy based aggregation and doing the sorting in the code itself after which the predictive algorithm is applied. I don't think in Automated Analytics version its possible. The manual intervention for data preparation is very much required in Automated unlike this custom component.
That's why I wanted to use this component. But error margin and accuracy is very much needed before implementing it. Any tip on the custom R component part to do it!
Regards
Ranajay
Hello Ranajay,
In case you have only one time series but multiple values per date, you could use this aggregation component to help prepare the data for the time series forecasting in Automated Mode. Just aggregate by date, probably with the sum of your measure.
http://scn.sap.com/docs/DOC-43445
For Automated Mode the data has to be sorted in ascending order (latest information at the bottom). I am not sure if the above component will sort the data, but that could be an easy manual step. (or another custom r component ;-). In case you are interested in creating your own custom r extension, here is an overview page to get you started
http://scn.sap.com/docs/DOC-62119
Many Greetings
Andreas