This blog post aims at providing a comprehensive overview on the most recent content. It’s meant to be a living document, so we’ll try to keep it as up to date as possible.
The links are structured as follows:
- Getting started – Your first steps with SAP HANA Machine Learning
- Machine Learning Operationalization – How to bring Machine Learning to life in real-world scenarios
- Deep-Dive sections into:
- Outlier and Anomaly Detection
- Time Series Analysis
- Text Mining
- External Machine Learning – How to integrate with your external R or TensorFlow servers
- Miscellaneous – All other relevant materials.
Denys van Kempen provides a broad overview on materials to get started with SAP HANA Machine Learning in this blog. You will find links to the documentation, recent demo videos as well as code samples in this collection:
With our simple SQL interface, developers and data scientists can easily work with all the features of the SAP HANA database and integrate them with any other SQL-based solution. While SQL is considered the third most popular language for Machine Learning, we are nevertheless aware of the fact that there are other scripting languages even more popular with data scientists being specifically Python and R.
To get started with the Python and R APIs, you might want to take look at our initial release blog post, from Arun Godwin Patel:
Arun also wrote two great articles on the power of the SAP HANA dataframe object that is introduced with the APIs. It builds the foundation of all the Machine Learning features in the library, so understanding it is important for all the following articles:
- Diving into the HANA DataFrame: Python Integration – Part 1
- Diving into the HANA DataFrame: Python Integration – Part 2
Last but not least, he introduces us to the Explanatory Data Analysis feature. This toolset specifically allows us to visually explore data using different graphics and charts, while benefiting from the remote execution of aggregation statements.
Are you looking for the ultimate step-by-step guide on how to get started with Python and Machine Learning in SAP HANA Cloud? Well, Andreas Forster has you covered with this article:
Too much Python around here so far from your perspective? Perfect timing to spotlight our R API!
Yannick Shaper prepared two articles on the basics of working with R and SAP HANA, and the integration of R-based SAP HANA Machine Learning within a Shiny App:
- Hands-On Tutorial: Becoming the Chief Data Cook with RStudio and SAP HANA
- Hands-On Tutorial: Leverage SAP HANA embedded Machine Learning through an R Shiny App
After taking the first steps into Machine Learning models, you might want to bring them to life in a production scenario. But before doing that, make sure the model meets your quality criteria!
Raymond Yao has prepared a great example of how to use the new Model Report for that. It assists in understanding and debriefing a trained model by displaying model statistics, variable importance and standard metrics.
Machine Learning Operationalization
Maintenance of the Machine Learning model lifecycle and especially versioning of different states of a model is an important part of making Machine Learning enterprise ready. This article from Xin Chen explains the details of how to set up a model storage in SAP HANA using the Python client for Machine Learning:
For managing and orchestration of large Data Science and Machine Learning architectures, SAP Data Intelligence comes into play. In this article Andreas Forster explains how to leverage the Python client for SAP HANA Machine Learning with the Jupyter Notebook operator in Data Intelligence:
To make Machine Learning part of a complex data processing workflow, you can include SAP HANA Machine Learning in your SAP Data Intelligence pipelines. The following article from Andreas gives you all the details to get started:
Your data resides in SAP Data Warehouse Cloud together with your business reports, but you don’t want to miss the power of SAP HANA Machine Learning? We’ve got you covered! Learn how to leverage SAP HANA Machine Learning from DWC:
One of the most common scenarios for SAP HANA Machine Learning is the implementation in the context of ABAP-based SAP applications, like SAP S/4HANA or SAP BW/4HANA.
Jerome Zhao showcases, how to call SAP HANA Machine Learning functions from ABAP in this article:
To provide a more sophisticated integration, especially with SAP S/4HANA, the Intelligent Scenario Lifecycle Management (ISLM) was introduced, to orchestrate all Machine Learning activities like the creation of scenarios and models as well as training, deployment and activation of those Machine Learning models.
Venkata Raghu Banda has prepared a comprehensive collection of materials on ISLM in this blog post:
Some of the highlights of his collection are:
- Embedding Machine Learning into SAP S/4HANA
- Leveraging Machine Learning with the ISLM Framework
- Migration of PAi to ISLM with the embedded ML
You may also check our overview pages on Intelligent ERP and ISLM to get the latest updates:
Let’s take a closer look into some of the most important scenarios for SAP HANA Machine Learning.
Andreas Forster created a nice demo on the use of regression techniques to predict used car prices, using the Python API.
Kurt Holst contributed a series of three blog posts focused on a classification scenario. He demonstrates the end-to-end implementation making use of the R API, as well as how to evaluate the business value of a model created that way:
- Machine Learning with SAP HANA – from R
- Machine Learning with SAP HANA – with R API Part 2
- Machine Learning with SAP HANA & R – Evaluate the Business Value
Outlier and Anomaly Detection
Likun Hou has prepared four blog entries on different techniques for outlier and anomaly detection.
The first article demonstrates the usage of Statistical Tests (Variance Test and IQR Test) for Outlier Detection. Likun shows that IQR test is a more robust outlier detection method with the presence of extremely deviated (from mean/median) values in the targeted numerical feature. However, both methods only work on 1-dimensional numerical data, so they are mainly applicable to outliers with at least one outstanding numerical features value.
The second option is to use the DBSCAN clustering algorithm to perform Outlier Detection. Different from the Statistical test approach above, all feature values can get involved if appropriate distance metrics are adopted.
Typically, these two methods (Statistical Test and Clustering) can only detect outliers in the input dataset, and the detection result cannot be generalized to new data points, because they do not come up with any model. The third article demonstrates how Classification methods can be adopted to overcome this difficulty.
However, all the aforementioned techniques become less applicable, when the dataset of interest is of high dimensionality (i.e. contains many features), or the boundary between normal points and anomalous ones is complicated. In his fourth article, Likun demonstrates a better approach by manually labeling the point of anomalies in the dataset, and then training a supervised Machine Learning model for the classification of normal points and anomalies.
Another approach to Anomaly detection is based on sensor data over time, that requires the usage of time series analysis techniques. We have the basics of that covered in the section below. Nidhi Sawhney and Rafael Pacheco showcase two scenarios in these three articles:
- Quality Identification of rail-road tracks – An application of Dynamic Time Warping using SAP HANA PAL
- Anomaly Forecast of Sensor Data in Energy Intensive Industries – Part I: The Machine Learning and Beer Production
- Anomaly Forecast of Sensor Data in Energy Intensive Industries – Part II: The Machine Learning Execution
Finally, Raymond Yao shows us how to apply Weibull Analysis – one of the most used algorithms for Predictive Maintenance use cases.
Another series of great articles from Likun Hou covers the most relevant aspects of Time-Series Analysis.
He starts off with explaining the basic principles of Time-Series Analysis, specifically the ideas of “Trends” and “Seasonality” and how to perform decomposition on these to prepare for an Anomaly Detection.
The second article explains how to apply the most commonly used techniques for Time-Series Analysis: Exponential Smoothing and ARIMA.
Lastly, Likun introduces one of the most recent enhancement of the Predictive Analysis Library (PAL): The Additive Model Time-series Analysis, that is an advanced approach that proves to be superior in dealing time-series with complicated trend, multiple seasonality as well as cyclic patterns.
Xin Chen dedicated another article to Seasonal Decomposition, showcasing examples of how this can be done with SAP HANA PAL.
While many Time-Series scenarios are based on just one time-dependent variable, there are also many cases where Time-Series consist of more than one time-dependent variable and each variable depends not only on its past values but also has some dependency on other variables. These Multivariate Time-Series are covered in this article:
The most recent enhancement to the SAP HANA Machine Learning features is the Text Mining feature. The initial version allows for analysis and classification of texts, like service tickets or text messages and enable users to explore relations among the texts. Learn how to make use of this feature in Alex Dalentzas blog post:
External Machine Learning
The third flavor of the SAP HANA Machine Learning is the integration of external Machine Learning servers. It mainly allows us to remotely execute Machine Learning models in TensorFlow or R on separate servers using data from a SAP HANA database (on-premises) and consuming back the results in SAP HANA as well.
These two articles provide an overview of the R server integration.
- SAP HANA and R hands-on: From freestyle to deployment
- Parallelization options with the SAP HANA and R-Integration
More information on the TensorFlow integration can be found in these posts from Philip Mugglestone and Nandi Kishore:
- Introducing SAP HANA External Machine Learning (aka TensorFlow Integration)
- Tensorflow Machine Learning Model Integration with SAP HANA
Sometimes, you have the right data, in the right place, at the right time, but your scenario requires them to be turned by 90 degrees. That can be cumbersome, but Nidhi Sawhney shows us, how pivoting can be easily done, using SAP HANA and the Python API.
This article describes, how to import multiple excel files into a single SAP HANA table using the Python Machine Learning client for SAP HANA.
If you are looking for more code examples on the use of SAP HANA Machine Learning, please take a look at our sample repository on Github, to find dozens of examples for the various use cases of Machine Learning.
I would like to thank all the above contributors for their tremendous effort and time to create these valuable materials!
As said in the introduction, this article will receive updates any time new relevant content gets created. If there is anything you miss in this collection (either because we missed it or because there is no resources on your specific topic), do not hesitate to reach to Christoph Morgen (SAP HANA Product Management) or me (SAP HANA Solution Management).
Also, we are happy to take your feedback, thoughts or questions in the comment section below!