Unlocking business insights by integrating Machine Learning in SAP Data Intelligence Cloud
The blog here provides guidance on developing a fundamental ML project within SAP Data Intelligence, demonstrating the creation of an ARIMA model in Python. By employing a date range and sales data, we aim to forecast sales for the next 30 days. Subsequently, we’ll expose this trained model as a REST API for inference, enabling your business applications to integrate these predictions into your daily operations.
Following this introductory tutorial, you will gain an in-depth understanding of the key elements of an ML scenario in SAP Data Intelligence, making it easier to implement more advanced requirements using a comparable approach.
Intelligent Processing with the ML Scenario Manager
In SAP Data Intelligence, the central machine learning application, referred to as the ML Scenario Manager, serves as a hub for efficiently organizing your data science resources and overseeing all tasks related to your work. This multifaceted data science tool is designed around the central concept of machine learning (ML) scenarios. An ML scenario can encompass datasets, pipelines, and Jupyter Notebooks. Moreover, within a scenario, you have the ability to monitor model performance metrics and review deployment history.
As part of your comprehensive workflow, you can create different versions of an ML scenario, and when required, initiate a new branch based on a previous iteration. Typical activities within the ML Scenario Manager include:
- Managing datasets and model artifacts,
- Creating Jupyter notebooks for experimentation,
- Setting up and overseeing data pipelines,
- Analyzing execution results and performance metrics
- Tracking and versioning your model deployment
We used the ARIMA Model, ARIMA being “AutoRegressive” Integrated Moving Average.” It is a widely used statistical method and a type of time series forecasting model. ARIMA models are employed for understanding and making predictions about time series data.
To start with, I Uploaded the sales data into the DI DATALAKE in Metadata Explorer.
Let’s begin our journey into the Machine Learning project. On the main SAP Data Intelligence page, access the “ML Scenario Manager.” This serves as the central hub for all Machine Learning-related activities.
Click the small “+” symbol located in the upper right corner to start crafting a new scenario. Name it “ARIMA_PREDICTION” and proceed to fill in additional details within the “Business Question” section. Complete the process by clicking the “Create” button to confirm the scenario creation.
Step 2: Load And Explore the Data in a Notebook
To load and explore the data within a Notebook, follow these steps:
- Navigate to the “Notebooks” tab and click the “+” icon.
- Name the Notebook “10 Data Exploration and Model Training.”
- Click “Create,” and the Notebook will open.
- You’ll be prompted to select a kernel. Retain the default option, which is “Python 3.”
From my Jupiter Lab Data Manager, I go to my Data Collection and copy the code to load my training data:
Step 3: Creating Pipeline and Deployment
All the necessary components are now in position to initiate the deployment of the model using two graphical pipelines:
- The first pipeline is designed to train the model and store it within the ML Scenario.
- The second pipeline is responsible for exposing the model as a REST API, facilitating inference.
To create the graphical pipeline to retrain the model, access your ML Scenario’s main page, select the “Pipelines” tab, and finally click the “+” sign.
Name the pipeline “ARIMA_CONSUMER” and select the “Blank” template. It will create a blank pipeline, then we need to create a flow as shown below.
- Data Loading: The pipeline begins by utilizing the “Read File” operator to load the necessary data.
The sales data has been read from the DI Data Lake.
- Model Training: The loaded data is then forwarded to a Python operator, where the machine- learning model is trained using the below script.
- Model Storage: Within the same Python operator, the trained model is stored within the ML Scenario via the “Artifact Producer” component.
- Quality Metric Calculation: The Python operator’s second output is used to calculate a quality metric for the model. This metric is also passed to the ML Scenario.
- Pipeline Completion: After both the model and its associated quality metric are successfully saved, the pipeline execution is concluded using the “Graph Terminator.”
Before executing the Pipeline, we need to create a docker file, for that follow the steps below:
- Access Repository: Start by accessing the “Repository” tab, typically located on the left-hand side of your interface.
- Navigate to Dockerfiles: Within the “Repository” tab, navigate to the “Dockerfiles” folder.
- Right-click and Create: Right-click on the “Dockerfiles” folder. This action will open a context menu.
- Select “Create Docker File”: From the context menu, select the option labeled “Create Docker File.”
By following these steps, you’ll initiate the process of creating a Docker image specifically tailored for your Python operator. This image will include the necessary libraries, granting you the flexibility to employ a wide range of Python libraries in your pipeline.
Include a unique tag to designate this Docker File for the Marathon scenario, and label it as “ARIMA_SALES.”
The configuration should appear as follows:
Now save the Docker file and click the “Build” icon to start building the Docker image.
Wait a few minutes and you should receive a confirmation that the build was completed successfully.
Now, proceed with configuring the Python operator within the model training stage to make use of the designated Docker image. Revisit the “ARIMA_CONSUMER” graphical pipeline, right-click on the “Python 3” operator, and choose the “Group” function.
This grouping can encompass a single or multiple graphical operators. At this group level, you can designate the Docker image to be utilized. Choose the group that encloses the “Python 3” Operator. Within the group’s Configuration settings, designate the “ARIMA_SALES” tag. Remember to save the graph once this configuration is set.
The pipeline is now fully configured and ready for execution.
Return to the ML Scenario, select the “ARIMA_CONSUMER” Pipeline, and proceed to click on the “Execute” button to initiate the training and model saving process.
You can skip the optional steps until you reach the “Pipeline Parameters.” In the “Pipeline Parameters” section, set “newArtifactName” to “TEST_1.” The trained regression model will be saved with this name.
Click on “Save.” Now, wait a while until the pipeline executes and completes.
With our model successfully trained and saved, the next step is to leverage it for real-time inference.
Step 4: Making Predictions via REST-API
Return to the main page of your ML Scenario and proceed to create a second pipeline. This specific pipeline is designed to offer a REST API for acquiring real-time predictions. Consequently, name the pipeline “REST_API_ARIMA.” Now, opt for the “Blank” template when prompted. Use the flow shown below for the pipeline.
The “OpenAPI Servlow” operator serves as the REST API provider, while the “Artifact Consumer” is responsible for loading the trained model stored in our ML scenario. The “Python36 – Inference” operator acts as the bridge between these two components. It receives input from the REST API call and leverages the loaded model to generate predictions. These predictions are subsequently relayed back to the client, which initiates the REST API call, by the “OpenAPI Servlow” operator.
To implement the desired prediction, you should make modifications exclusively to the “Python36 – Inference” operator. Open the operator’s “Script” window and apply the following script for prediction.
After saving the modifications, return to the ML Scenario. Then, proceed to deploy the newly updated pipeline.
Click through the screens and select the trained model from the drop-down. Now, click “Save.”
Wait a few seconds while the pipeline is running.
While the pipeline is active, you have access to the REST API for inference. To utilize it, follow these steps. These instructions are based on using Postman, though you can choose any other tool you prefer.
- Copy the deployment URL from the above screen. You might encounter a message like “No service at path XYZ does not exist.” This URL is not yet complete, which triggers the error if you try to access it.
- Open Postman.
- Paste the Deployment URL as the request URL.
- Extend the URL by adding “v1/uploadjson/” to it.
Change the request type from “GET” to “POST.”
Now you can proceed with testing the REST API for inference.
This is the result that we see in POSTMAN. Prediction of next 5 days Sales.
The key takeaway from this tutorial lies in gaining a profound understanding of the crucial elements involved in an ML scenario within SAP Data Intelligence. The ML Scenario Manager, acting as a central hub for all machine learning-related activities, efficiently organizes data resources and associated tasks. By creating and managing encompassing datasets, pipelines, and Jupyter Notebooks, users can monitor model performance metrics, review deployment history, and seamlessly conduct analysis.
By empowering you to navigate through the ML Scenario Manager and initiate the deployment of ML models via graphical pipelines, we hope this guide will serve as a stepping stone toward implementing more advanced requirements within the realm of ML using a comparable approach. We encourage you to explore the capabilities that SAP Data Intelligence offers to unlock insights and drive actionable intelligence within your organization.