Regression Prediction Scenario in SAP Analytics Cloud
In this blog post, let’s see how to perform a Regression prediction scenario in SAP Analytics Cloud. The scenario chosen for this blog post is Profit Prediction of Startup companies. We are going to utilize the ‘Smart Predict’ feature which is the most advanced predictive feature in SAC.
To address prediction on different business use cases, there are three types of Smart Predict scenarios provided by SAC namely
After selecting the scenario, we train the predictive model by feeding historical data from our data source. SAC uses inbuilt machine learning algorithms to predict data from historical data. Predicted data can be saved in SAC models that can be consumed in stories and applications.
In our case, we are going to use the Regression scenario. Regression is used to estimate the value of a measure
Profit prediction of Startup Companies using SAC Regression Predictive Scenario:
The version of SAC used for carrying out the scenario is 2021.20. The data used for this scenario is the dataset of startup companies which is divided into training data and test data. In the dataset, records of 1000 companies are available out of which let’s take 900 records for the training dataset and the remaining 100 records for the test dataset.
Now, let’s dive into the step-by-step procedure.
1)Importing the training dataset
In the dataset tab in SAC, import the training dataset by choosing the Excel/CSV file. In our case let’s name our training dataset as Startup.
As we can see in the dataset overview pane, our dataset has 900 rows and 6 columns. There are 2 dimensions and 4 measures.
Let’s see a detailed description of all the columns.
ID: Unique ID of the startup companies
R&D Spend: It is the amount that the company is spending on Research & Development.
Administration: It is the amount that the company is spending for its administration.
Marketing Spend: It is the amount that the company is spending on Marketing.
State: It’s the place where the company began its initial operations.
3)Predictive Model Creation
Let’s create and compare predictive models to find the best predictive model to bring out better predictions.
Let’s choose regression in predictive scenarios application.
Now let’s select our training dataset ‘Startup’ as data source and profit as target(numeric column containing the data to be predicted).
Now click on ‘Train’. Training is a process where SAC Smart predict uses machine learning algorithms to explore relationships in your data source to come up with the best combinations for the predictive model.
In a few seconds, our ‘Model 1’ has got trained. Now, it’s time to assess our predictive model.
The role of the Prediction Confidence is to measure if the predictive model can do the predictions with the same reliability when new cases arrive. In our model, the predictive confidence is 99.56% which is near perfect .
Target statistics provide the minimum, maximum, mean, and standard deviation of both the training and validation data.
During the training, Smart Predict calculates an optimized set of influencers to include in your predictive model. Influencers are the variables that actively have an impact on the target variable.
In our case Smart predict has calculated both R&D spend and Marketing spend as influencers out of which R&D Spend contributes the most with 91.88%.
Now we can see that there is not much difference between the prediction confidence of both the models and the predictions are going to be nearly the same.
Let’s check that by applying both models to our test data set.
4)Applying Predictive Model:
Let’s import our test dataset by importing the excel/CSV file. After importing let’s apply the predictive model to our test dataset in our predictive scenario application.
The output will be stored as a separate dataset where we can see the profit value predicted by our predictive model for the test dataset values.
Now following the same procedure let’s apply model 2 on our test dataset and compare the predicted values of both the models with the actual profit value of the records in our test dataset against each company ID.
The perfection percentage for each company is calculated by dividing the actual profit value by the Predicted value or vice versa. The overall Model perfection percentage is derived as the average perfection % of all the companies.
As we mentioned earlier, there is not much difference between the perfection percentage of both models since the prediction confidence of both the models was nearly the same. When we look deep into the numbers we can make sure that model 1 will be the perfect predictive model to get better results with predictive confidence and perfection at 99.56% and 97.82% respectively.
Hence our regression predictive model has predicted the profit of startup companies dataset with predictive confidence and perfection of 99.56% and 97.82% respectively. I hope that this blog post has really helped you in understanding regression predictive scenarios in SAP Analytics Cloud with a practical use case. Your ideas and suggestions are welcomed.