Data Science & SAP Analytics Cloud
This blog is in continuation to my earlier blog on Data Science https://blogs.sap.com/2020/07/07/sap-s-4-hana-and-data-science/.
SAP Analytics cloud (SAC) is Software as a Service (SaaS) platform which is used for providing analytical capabilities to all users in one product. SAC has capabilities in areas of analysing, planning, predicting and reporting all in one place to reduce time and save effort.
My blog is related to how predictive analysis is embedded in SAC to simplify business understanding and how business can extract powerful information to take any decision related to future planning.
The whole idea of this blog is to help Business Users understand the use of predictive analysis and how SAC can simplify their life. I do not intend to use mathematical analysis or complex logic of inferential statistics or advance hypothesis. I strongly believe in Business first policy and IT is an enabler in making Business life easy. So my blog focusses on the usage of statistics without getting into how the formula works or is derived.
To help Business Users SAP have integrated automated predictive feature into SAC. Mostly in all Business there is a requirement of 3 main predictive techniques. Hence SAP has given the functionalities that can cater to most popular demands of Business.
The predictive techniques are
- Classification: Simple definition is data mining functions that derive target categories or classes.
- Credit card application – Which group of people may be found to be fraudulent?
- Energy and resource- Which area or age group will have demand for roof top solar panels?
2. Regression: Simple definition is relation between certain variables to give a cause & effect relationship.
- Retail example: What influence sells? Which factor has cause effect relationship on my revenue?
- Supply Chain: Price of Raw Material has a direct relationship with Oil Price. Which other factor has a cause and effect relationship on logistic?
3. Time Series: Simple definition collection of data over a specific period and predict behaviour over a span of time.
- Plant or Factory: When we can have a machinery breakdown in future?
- Retail: Sale of gold will increase during a certain festive season. When it is right time for my business to enter the market?
A brief overview of data sources
We can leverage various data sources, some of them being:
- Excel or csv.
- Generic OData sources.
- SAP Applications
- SAP S/4 HANA sources CDS views
- SQL Databases.
In most of the cases your ABAP or UI5 consultant can help you by providing the CDS views if you have S/4 HANA. For excel and csv it is a simple upload.
Out of the three-predictive technique, I shall try to explain some interesting case studies on Regression Model. It’s easy and simple and can give you some of the best predictions to quantify the cause effect relationship.
Simple Linear Regression:
Ice Cream Sale: I have just plot a graph in excel and you can create some value and a similar graph for your understanding. SAC will be used for multiple regression when we have more than one factor.
|Temperature (C)||No of Ice cream sold|
So, If I want to predict the sales of ice cream when temperature is 40’C I shall use the simple linear equation
Y=mx+c=5.1054*40+5.4384=209.54=210 Cups of Ice cream.
R2: As of now let us just understand higher the R2 better is the model. In our simple case it is ideal.
If I want to consider some more factor which might be affecting my ice cream sell. This will give me a direction what should be my strategy of selling ice creams.
So now my formula would be: y= c+ m1x1+m2x2+m3x3……mnxn.
To keep it simple I am considering the effect of rainfall on the sale of ice cream. Then we may define and find out how we may run a regression model in SAC.
Steps in SAC
- We go to Menu
- Select predictive scenario.
- Select Regression
4. We create a prediction model for Ice-Cream and we train our model on our source data.The training dataset observation are the foundation of our predictive model.
5. We get certain values which are explained below in a simplified manner considering the scope of this blog
Root Mean Square Error (RMSE): Measures avg. difference values predicted by my model vs. the actual value.
Prediction Confidence: Measure of the accuracy of predictive model. For reference, 95% or above is considered a very good score and 85-95 is still considered good.
Mean: Average of dataset.
Std. Deviation: Dispersion of data set.
6. The Influencer: the picture describes it all. It is the relative importance of each variables used in predictive model. In our case it is rain and temperature
7. The most crucial information
Let’s keep this simple and please note the whole idea that SAP is trying to make is make it easy for all.
- Validation-Actual: actual target value as a function of prediction. (y= c+ m1x1+m2x2+m3x3……mnxn)
- Perfect Model: All prediction is equal to actual values
- Validation Error min & max: deviation of my current predictive model.
- Validation & Perfect Model: Matches hence the predictive model is accurate
Hence, we can conclude that Rainfall and Temperature are two strong influencers in predicting the sales of ice cream. Although generally temperature seems to bigger contributor, but Rain has a higher value and that the beauty of this model.
Other simple case studies which could be considered for multiple regressions are
- Trend of Employee Performance and multiple factors which influence the trend.
- Key factors to control the cost of production.
- Trend of sale of a product with certain influencing factors like location, price, promotion etc.
My next blog will be on how we can use Inferential Statistics in a very simple and easy way using R in SAC.