Skip to Content
Product Information

S/4HANA and Machine Learning for Solution Architects

Last updated: April 8 2020 (update reflects SAP Data Intelligence 3.0)

In my short blogs I try to provide an interesting overview on selected topics for SAP Solution Architects. This time it is Machine Learning (ML). I provide an overview of the SAP technology for Machine Learning, introduce some pre-built application-specific scenarios and show where to find more information. The blog does not cover implementation or technical details. I also do not cover conversational Artificial Intelligence or Robotic Process Automation (RPA).

What is Machine Learning?

Machine learning is a disruptive technology and is forecast to grow significantly and is one of the key innovations in digital transformation. It is a subset of Artificial Intelligence (AI) and predicts results using models that learn from large sets of sample data. In traditional solutions, humans create rules and solutions that work with clear requirements and structured data. Machine Learning can automate processes or decisions that are based on complex rules and structured data (e.g. database tables) or unstructured data (like natural language and images). An example is accurately classifying service tickets based on their text content. (Links to this below). The business benefit is to allow people to focus on tasks that add more value. The picture below shows possible ML scenarios in the Procure to Pay process.

There is a huge choice of software and solutions for ML. Whilst it is clear that incredibly complex problems can be solved by AI (e.g. self driving cars), the question for organisations using SAP enterprise software is can ML be applied to business scenarios with a clear return on investment. S/4HANA makes it easy to try simple ML scenarios and then allows you to expand into more ambitious solutions.

SAP Solutions for ML

SAP provides two technology sets for ML: SAP Data Intelligence and Embedded ML. Both approaches provide 1) pre-built application-specific business scenarios and 2) allow you to build your own machine learning solutions.

SAP Data Intelligence provides a portfolio of deep learning that can work with vast quantities of structured and unstructured data from SAP and external cloud and on-premise sources such as Hadoop HDFS, AWS S3 and Azure Data Lake. It includes open source frameworks and algorithms familiar to data scientists such as the languages Python and R and libraries such as TensorFlow and Pandas. It is based on the SAP Data Hub solution and SAP Cloud Platform ML. This is SAP’s go to solution for building and operating ML solutions. It is ideal for combining non-SAP big data with SAP enterprise data. SAP Data Intelligence is available as a cloud solution or on-premise.

Embedded ML works within the S/4HANA instance and uses predictive analytics for simpler scenarios based mainly on SAP structured data. It is used for use cases that demand low data, RAM and CPU. Examples of embedded SAP solutions include “early detection of slow or non-moving goods” and “payment block cash discount at risk”. Applications can access a machine learning framework through something called the Predictive Analysis Interface (PAI). The framework provides Fiori apps to train and manage machine learning models.

The ML scenarios available to you vary depending on whether you are using SAP S/4HANA on-premise, SAP Cloud Platform or SAP S/4HANA Cloud. Some e.g. SAP Cash Application require an additional license.  Here are some examples:

See this ASUG blog for lists of intelligent scenarios (many with links to more information):



Find out More

If you are new to machine learning and would like to understand the principles and how it works you could look at the OpenSAP course “Enterprise Machine Learning in a Nutshell”.  This useful self-explanatory picture is taken from the course:

For more information on SAP Data Intelligence, see the SAP page:

SAP help for the cloud version of SAP Data Intelligence is here:

SAP Data Intelligence is also available as a BYOL (bring your own license) model where it can be deployed on-premise in your own data center, on any hyperscaler public (AWS, Google, Microsoft) or private cloud.  See SAP help here:

Data scientists familiar with ML technology can look at the OpenSAP course SAP Data Intelligence for Enterprise AI:

Solution Architects new to ML should get an introduction to SAP Hub first at the OpenSAP course Freedom of Data with SAP Data Hub:

SAP Data Hub is built using Docker containers managed in Kubernetes.

For information on how Predictive Analytics Interface can be used for embedded machine learning, see this blog:

ML Jargon Buster

There are lots of new technology layers to understand.  Here is my jargon buster to save Solution Architects time in Google.

  • Apache Spark: platform for high volume distributed processing of big data.  Often used with Hadoop HDFS. Processing includes SQL, batch processing, stream feeds and machine learning.
  • Azure Data Lake: Microsoft’s big data analytics solution. Often encountered when using ML for big data.
  • Data Hub: SAP’s solution to catalog, orchestrate and automate the processing of big data. Data is not persisted; this is not a data warehouse. It is used to build and operate Machine Learning and Internet of Things (IoT) solutions. It is used in conjunction with SAP Analytics Cloud for data intelligence across BW, HANA, S/4HANA and non-SAP data lakes. It uses Docker and Kubernetes technology for massive scaleability.
  • Docker: used to create, deploy, and run applications using “containers”.  Docker containers are  lightweight and fast. Containers build on Virtual Machine technology by allowing each container to share a common host Operating System making them fast to start. An application is made up a number of containers linked together.  Docker is open-source.  This technology is used in SAP Data Hub.
  • Hadoop: provides massive distributed storage for structured and unstructured data with massive processing power.  It is an open-source clustered file system called HDFS often managed on-premise.
  • Jupyter Notebook: a tool used by data scientists to integrate code and its data output into a single document.  Commonly used with Python for ML work. Jupyter Notebook is an open-source web application.  Used for data cleaning, statistical modeling and simulations, data visualization and machine learning.
  • Kafka: an open-source software platform for handling high through-put real time data feeds. Originally developed by LinkedIn but then became part of the Apache Software Foundation open-source. Can be used in SAP Data Hub operators.
  • Kubernetes: system for managing containerized applications across a group of Virtual Machines. Kubernetes is open-source based on Google’s Borg solution.  It provides massive scaleability. This technology is used in SAP Data Hub.
  • MQTT: Message Queuing Telemetry Transport (MQTT) is a commonly used protocols in IoT (Internet of Things) projects. Can be used in SAP Data Hub operators.
  • Python: open-source language used in many Artificial Intelligence (AI) projects with a huge library for machine learning. Libraries include Keras, TensorFlow and Pandas. Can be used in SAP Data Hub operators.
  • R: open-source language used to analyze and manipulate data for statistical purposes. Can be used in SAP Data Hub operators.
  • S3: Amazon Web Services (AWS) scaleable, high-speed, web-based cloud storage. Often encountered when using ML for big data.
  • Vora: SAP scaleable distributed in-memory database used to cache or persist data in SAP Data Hub data pipelines. It runs on a Hadoop cluster and is tightly integrated with Apache Spark.


I hope you found this blog informative. I would be happy to see your feedback on using SAP Machine Learning solutions.

Amin Hoque
Enterprise Architect at SAP Services UK





Be the first to leave a comment
You must be Logged on to comment or reply to a post.