Personal Insights
MLOps in practice: Applying and updating Machine Learning models in real-time and at scale
Nowadays, businesses and organizations increasingly rely on real time machine learning insights in order to be able to adapt to changes and to maintain competitive edge – e.g. by better understanding their customers, by reacting to certain events or disruptions, or even by simply predicting future business impacts. The trend is visible across all industries and lines of business – (i) from manufacturing and asset-heavy organizations, which need to react to complex IoT sensor values streamed at high speeds, (ii) via banks, which use transaction data to detect credit card fraud on the spot, (iii) up to retailers, applying marketing techniques to determine and deliver the optimal approach for a specific customer in the right time.
All those processes require the output of the machine learning models to be available in real time, so that an appropriate reaction can be triggered in an adequate timeframe. This real time requirement alone poses a challenge for every IT architecture. To that, additional factors – e.g. the need to enrich the data by accessing remote systems (e.g. business metadata) or to be able to handle a high amount of simultaneous requests per second, etc. – increase to complexity of the landscape further.
In this blog post, a solution in SAP Data Intelligence will be described, which scales to handle several thousand model scoring requests per minute and manages to return the model results per request in real time (less than 5 ms). Furthermore, an example, message-based concept for the automated deployment of new versions of machine learning models, without interruptions in the productive system, will be shown. These model operationalization tasks are often referred to as MLOps and describe the steps after an appropriate model has been selected and optimized by a data scientist.
The model development is not handled in this article, but SAP Data Intelligence offers a collection of features to enable the whole end-to-end machine learning process. There are many excellent posts providing details on how to use ML Scenarios, how to access data across various landscapes, like SAP S/4, data lakes, etc. and how to realize a simple CI/CD pipeline.
For that reason, this blog post focuses on the operationalization and scaling of machine learning models. To simplify reading, the concepts are discussed in the framework of a real-time marketing use case for a retailer. The objective of this use case is to have a machine learning model predict the probability of success in upselling a given product to a customer, while the same is browsing a website. In accordance to the model prediction, the website should be adapted dynamically to present adequate contents from most suited personalized marketing campaign. The technical requirements arising from this scenario are:
- Overall scoring speed: To guarantee a seamless experience for the user, the whole process should be finished in less than 10 ms.
- Scalability: In a realistic scenario, the system should be able to handle several thousand requests per minute.
- Model update: It is crucial to be able to update the underlying models without introducing downtimes in the process.
The solution is presented below with emphasis on those three components.
High level view and flow of the process
The figure shows an example high level architecture, developed for this use case along with the involved components.
High level architecture and flow
The main software components are: Content Management System (CMS), Apache Kafka server as well as SAP Data Intelligence. Within SAP Data Intelligence two pipelines are present – model training and model apply pipeline.
The logical flow can be summarized as follows:
- The process is triggered by a customer opening the retailer app, which internally requests the homepage of the retailer.
- The CMS system, responsible for the webpage contents, forwards the customer ID (known through the app) along with additional context-related parameters (e.g. geolocation, etc.) to the ML application pipeline via a Kafka message.
- The apply pipeline consolidates all available data for this customer, e.g. by pulling it from remote systems like SAP Hybris Marketing, etc.
- The current model is applied on the enhanced analytical dataset to calculate the probability to upsell.
- Finally, the results are send back to the CMS system, which adapts the contents of the webpage (e.g. by adding a special offer, etc.) and sends it to the requesting app.
Overall scoring speed
Two components from the flow above are the main contributors to the overall scoring speed – (i) the time for the CMS system to trigger the application pipeline and (ii) the actual execution of the application pipeline. We used Kafka messaging to address the first point and JavaScript-based scoring of the model within SAP Data Intelligence to increase the execution speed of the application pipeline.
Kafka-based communication
All communication, i.e. between training and scoring pipelines, or bringing new data to the scoring pipeline were realized using Kafka messages.
Scoring pipeline (simplified) communicates with the rest of components using Kafka messages
Kafka messages are a widely used tool to stream event or sensor data in real time. The distribution of messages follows a publish-subscribe concept – involving producers, consumers and brokers. The Kafka broker (server) is the central entity for the messages, while producers push the records under specific topic and consumers pull the message on the receiving end.
REST-based APIs were also tested but didn’t provide all flexibility needed for the use case. The Kafka message approach showed very stable behavior and was capable to react to the amount of incoming traffic within the required time interval. The out-of-the-box Kafka consumer and Kafka producer operators in SAP Data Intelligence make it really easy to setup the communication. All information needed is the broker address as well as the message topic.
Scoring APL models using JavaScript
In the case of this customer the binary classification model for the upsell probability was based on the GradentBoosting algorithm in the SAP HANA’s Automated Predictive Library (APL). APL is a machine learning library, which allows the automated training and tuning of models within SAP HANA, without moving the data out of the database. It also delivers global and local explainability values for better understanding of the model. With a recent update, it is now technically possible to load and apply APL models, which have been previously trained in SAP HANA and exported in JSON format. Currently, this is achieved by a JavaScript runtime and more details on this approach can be found in the blog post by Andreas Forster and Marc DANIAU – Hands-On Tutorial: Score your APL model in stand-alone JavaScript.
Since JavaScript (JS) is platform-independent it allows to execute the scoring code on various target systems. In this project, the JavaScript machine learning application code was hosted in a Node.js operator of SAP Data Intelligence.
Node.js operator for scoring the APL models on the fly
The JS scoring code has several dependencies to external files, which have to be made available during runtime (check JavaScript scoring section of the blog of Andreas and Marc for further information on where to find and how to install the required files). For this purpose, they need to be added to a Docker image using a similar Docker file definition:
FROM $com.sap.sles.base
RUN mkdir /tmp/JS/
COPY --chown=vflow:vflow amdefine.js /home/vflow/amdefine.js
COPY --chown=vflow:vflow autoRuntime.js /home/vflow/autoRuntime.js
COPY --chown=vflow:vflow dateCoder.js /home/vflow/dateCoder.js
COPY --chown=vflow:vflow utils.js /home/vflow/utils.js
After the Docker image is build and tagged, the Tags need to be added to the group of the scoring operator
Tagging the Docker image for later reference in the pipeline
With those settings it is possible to host and execute the JS-based scoring code in a SAP Data Intelligence pipeline. This allows for the data and scoring to remain in the system, which minimizes data movement and decreases the overall execution time.
Scalability
Another approach to tweak execution speed is the use of parallelization. The multiplicity parameter for a group of operators in SAP Data Intelligence instructs the system to execute parts of the pipeline in parallel. For this purpose the existing scoring pipeline was updated, adding a new group and a Python operator:
Scoring pipeline adapted to handle the multiplicity setting
In the settings of the Scoring group it is possible to specify a multiplicity value as shown in the following figure:
Multiplicity controls the number of parallel pods started to execute the specified group of operators
Setting up a multiplicity value of 5 in the Scoring group will have the effect of SAP Data Intelligence creating 5 independent pods running in parallel for this part of the pipeline. Each of those pods will load the current model and wait for incoming data to be scored. This means that 5 model instances are ready to process new data points, which parallelizes the scoring task, thus accelerating the end-to-end flow. The whole creation, start and stop of parallel pods, as well as the distribution of data to available pods, is managed by SAP Data Intelligence for the user.
Model update
Model update is an important MLOps topic. On one hand, models need to be updated regularly at a frequency, which depends on several factors like the dynamics of the process being modeled, etc. On the other hand, applications of models in production shouldn’t be interrupted to update the version of the model. Additionally, for many use cases it is important for the process applying the model to be aware that a new model is available so that a reload can be scheduled.
The figure below shows a simplified version of a training and validation pipeline, which uses Kafka messages (e.g. via NewModel topic) to notify potential application pipelines that a new model has been trained and is available now in the repository.
Training pipeline uses Kafka producer to push a message that a new model is available (simplified)
The application pipeline listens to the NewModel topic and as soon as a message is received, this triggers a model loading process. Once the model is loaded from the repository, it becomes immediately available to be used to predict the upsell probability for the next customer request which reaches the system.
Results
The scoring pipeline with the JavaScript-based scoring for APL as well as multiplicity of 5 was used in a standard tenant of SAP Data Intelligence with over 2000 customer IDs sent simultaneously. The process took 2.5 seconds to process all of them with a single scoring ranging from 2-6 ms. Below is a snapshot from the SAP Data Intelligence terminal, showing the last part of the time measurement results:
Using the method to score over 2000 customers in 2.5 seconds
Summary and next steps
The approach described in this blog post demonstrated how SAP Data Intelligence can be used to deploy machine learning models, how to automatically update them without interruptions of the productive system and finally, how to enable real time scoring of the models. The communication between all parts of the solution is based on Kafka messages, which also allows to trigger a mass model scoring.
The concept consists of two main pipelines:
- A training pipeline for model training and validation, which is run on demand. This pipeline sends a notification that a new model has been trained using Kafka messages
- An apply pipeline listening to two Kafka topics – for new data and for new models. The pipeline deploys a Node.js operator to host the JavaScript engine for exported APL models in JSON format and uses multiplicity to parallelize model application.
The advantage of this approach is that it does not contain any use case specific parts and can be used as a starting point for many similar problems. Especially applications utilizing IoT and sensor data from the fields of Manufacturing, Utilities, Automotive, Oil & Gas and Chemicals, but also Banking and Retail can be a perfect fit. Since the scoring runtime engine is based on JavaScript, it can be easily executed on other platforms, e.g. on Edge devices, which would potentially decrease the amount of data that needs to be transferred from sensors to SAP Data Intelligence.
If you have any specific ideas or use cases in mind, or just questions and comments, please feel free to reach out.
At Teched 2020 Stojan Maleschlijski and Andreas Forster presented and demoed this project, how the JavaScript scoring can be embedded with SAP Data Intelligence and Kafka to personalize a website for more targeted Marketing. The recording is available on Youtube.
Excellent and informative blog