SAP BTP Showcase – Run future sales prediction using SAP HANA Cloud Machine Learning algorithms
This is the 4th of 7 blog posts which are part of the SAP Business Technology Platform Showcase series of blogs and videos. We invite you to check this overall blog, so you can understand the full end-to-end story and the context involving multiple SAP BTP solutions.
Here you will see how to create an SAP HANA Cloud HDI container, load training and testing historical data, and run Predictive Analytics Library (PAL) procedures for just-in-time predicting future sales (energy consumption) values.
Running all steps exposed in this blog, you should be able to:
- Create an HDI container and deploy a project using SAP Business Application Studio
- Run a just-in-time Exponential Regression algorithm to predict future values
- Create a view in SAP Data Warehouse Cloud comparing actual and predicted values
- Create a simple dashboard comparing actual and predicted values
Below you can see this is the 4th step of the “Solution Map” prepared for the journey on the referred overall blog:
SAP BTP Showcase – Overall Technical Architecture
Have you ever improved your analytics using SAP HANA Cloud’s Machine Learning (ML) capabilities?
This is possible using SAP HANA Cloud Predictive Analytics Library (PAL).
In the picture below you can see how to leverage SAP HANA Cloud Advanced Analytics features, using an optimized engine for statistical algorithms:
As you can see, SAP HANA Cloud can run sophisticated statistical algorithms accessing the same RAM area where the data is already persisted. Differently from the traditional approach, data doesn’t need to be replicated to an specific engine/area in order to run Machine Learning. In addition, SAP HANA Cloud’s “in-memory” architecture boosts the execution on a large order of magnitude (+1000x faster). Conclusion: Compared to the traditional approach, SAP HANA Cloud can provide a much simpler and faster platform for Data Scientists.
In this example, we will create an SAP Business Application Studio project, and deploy an HDI container on SAP HANA Cloud. This HDI container will persist all historical data for training and testing the model, and them apply an Exponential Regression algorithm to calculate predicted values, based on the past behavior of sales (energy consumption) values. Predicted values will then be consumed by business users on SAP Data Warehouse Cloud and SAP Analytics Cloud.
Forecast analysis in SAP Analytics Cloud
We understand that there are multiple alternatives for exploring the power of SAP HANA Cloud for Machine Learning. Other examples are:
- Run Machine Learning algorithms using Python on top of SAP HANA Cloud (targeted to Data Scientists)
- Run SAC Analytics Cloud’s embedded forecast features (targeted to business users – this topic is also presented in Blog 7: Consume SAP Data Warehouse Cloud’s assets using SAP Analytics Cloud)
Although there’s no need for a dedicated HDI container for running Machine Leaning in SAP HANA Cloud, we decided to create a specific container for facilitating the didactic for this specific blog. Generally, architects run PAL algorithms where the data resides already, looking for not duplicating data, simplicity and higher performance.
SAP HANA Cloud provides the ability to run Machine Learning algorithms “in-memory”.
Let’s follow a step-by-step implementation for the PAL approach, developing a new SAP HANA Native Application project using SAP Business Application Studio.
Although you will be able to watch and learn everything that is explained in this blog on a detailed technical video, we also consider you may want to implement this concept by yourself. In this case, let’s guarantee you have provisioned everything you need for deploying the project.
If you still don’t have an SAP account for starting your developments, don’t worry… SAP is providing you with a completely free SAP trial account, so you can join our network and get access to all SAP BTP solutions (SAP HANA Cloud. SAP Data Warehouse Cloud, SAP Analytics Cloud) required for you to implement all projects & artifacts presented in this blog.
You can follow the mission Getting Started with your SAP HANA Cloud Trial for Provisioning an SAP HANA Cloud instance.
We will use SAP Business Application Studio for developing this project. If you still do not have access, you can learn how to setup a new subscription here.
The project source code is available in the official Github SAP-samples and can be cloned in SAP Business Application Studio, so you can easily reproduce this deployment in your own landscape.
We assume you already have SAP HANA Deployment Infrastructure (HDI) skills for deploying this project. You can watch this official SAP HANA Academy video if you want to improve your knowledge, and also take a look on the SAP Help Portal – HDI Reference documentation.
The Predictive Analysis Library (PAL) defines functions that can be called from within SQLScript procedures to perform analytic algorithms. The current release of PAL includes classic and universal predictive analysis algorithms in ten data-mining categories:
- Time Series
- Social Network Analysis
- Recommender System
You can get detailed information about all of those algorithms, as well as the one used in this blog, on the official documentation available at the SAP Help Portal – PAL Exponential Regression algorithm.
Now, we will split the scenario in 4 main topics in order to facilitate your understanding:
In our end-to-end demonstration, we talk about an utility company that wants to optimize it’s Energy Production, as already explained in the overall blog. Our data warehouse will have 4 main source tables/view:
- Energy Consumption Actual – Quantity of energy consumed by people in Germany in MWH
- Energy Production Actual – Quantity of energy produced by utilities
- Energy Consumption Predicted (calculation view) – Applying a Machine Learning algorithm
- Energy Production Planned – Quantity of energy planned for production by utilities
In this blog we will focus on item 3, Energy Consumption Predicted.
For running the PAL algorithm, we will need 3 persisted datasets (which were included in the source code of the project):
- ENERGY_CONSUMPTION_TRAINING_80.csv (historical data for training)
- ENERGY_CONSUMPTION_TEST_20.csv (historical data for testing the coefficient)
- ENERGY_CONSUMPTION_FORECAST_INPUT.csv (future data to be predicted)
2- Run the Exponential Regression algorithm to predict future values
We will get random 80% of historical 5 years Energy Consumption Actual data, exported by a system of records (e.g. SAP ERP) on .csv format, and we will persist this data on this HDI container. Then, we will get the remaining data (random 20% missing records), and also persist in the database. These datasets contain historical granular data of regular 15 minutes snapshots of the amount of energy being consumed in MHZ at that moment. We will have almost 200K records on those datasets. You can see an example of this data below:
For this use-case, we will predict future values for the coming year, so we will have another persisted table, considering 365 days divided in slices of 15 minutes each, so we will have about 36K records to be predicted, as the picture below demonstrates.
We understand that there are multiple ways to load data into SAP HANA Cloud, and that we could use even more sophisticated alternatives for replicating data. However, the intention here is to stick to the technical Machine Learning’s use-case.
A possible scenario, for example, is SAP Landscape Transformation Replication Server replicating ERP’s data to SAP HANA Cloud in real-time.
We will develop a Table Function to calculate predicted values using the Exponential Regression algorithm, and then create a Calculation View that invokes this Table Function, encapsulating all the processing on a simple artifact that can be consumed directly by SAP Data Warehouse Cloud.
The highlight of this approach is that the Machine Learning algorithm is executed just-in-time, every time the Calculation View is invoked. That means that any new data made available in any of those 3 input datasets, will reflect on updated predicted results.
3- Create a view in SAP Data Warehouse Cloud comparing actual and predicted values
In SAP Data Warehouse Cloud, we will create a new Connection to this HDI container, deploy the artifacts/tables, and then create a “Graphical View” comparing Consumption Actuals and Consumption Predicted values. This comparison provides business users with an interesting analysis.
4- Create a simple dashboard comparing actual and predicted values
We will then demonstrate how to easily consume this content on SAP Analytics Cloud, and create an interesting graph for business users to get insights.
Now that you understand the complete scenario, you can follow-up on watching this detailed technical video, and check how this project can be implemented.
All source code and data are available in the official Github SAP-samples.
In this blog you could learn how to create an SAP HANA Cloud HDI container, load training and testing historical sales data, and run Predictive Analytics Library (PAL) procedures for just-in-time future sales predictions.
All of your feedback is appreciated. Enjoy!