MLOps in practice: Applying and updating Machine Learning models in real-time and at scale
Nowadays, businesses and organizations increasingly rely on real time machine learning insights in order to be able to adapt to changes and to maintain competitive edge – e.g. by better understanding their customers, by reacting to certain events or disruptions, or even by simply predicting future business impacts. The trend is visible across all industries and lines of business – (i) from manufacturing and asset-heavy organizations, which need to react to complex IoT sensor values streamed at high speeds, (ii) via banks, which use transaction data to detect credit card fraud on the spot, (iii) up to retailers, applying marketing techniques to determine and deliver the optimal approach for a specific customer in the right time.
All those processes require the output of the machine learning models to be available in real time, so that an appropriate reaction can be triggered in an adequate timeframe. This real time requirement alone poses a challenge for every IT architecture. To that, additional factors – e.g. the need to enrich the data by accessing remote systems (e.g. business metadata) or to be able to handle a high amount of simultaneous requests per second, etc. – increase to complexity of the landscape further.
In this blog post, a solution in SAP Data Intelligence will be described, which scales to handle several thousand model scoring requests per minute and manages to return the model results per request in real time (less than 5 ms). Furthermore, an example, message-based concept for the automated deployment of new versions of machine learning models, without interruptions in the productive system, will be shown. These model operationalization tasks are often referred to as MLOps and describe the steps after an appropriate model has been selected and optimized by a data scientist.
The model development is not handled in this article, but SAP Data Intelligence offers a collection of features to enable the whole end-to-end machine learning process. There are many excellent posts providing details on how to use ML Scenarios, how to access data across various landscapes, like SAP S/4, data lakes, etc. and how to realize a simple CI/CD pipeline.
For that reason, this blog post focuses on the operationalization and scaling of machine learning models. To simplify reading, the concepts are discussed in the framework of a real-time marketing use case for a retailer. The objective of this use case is to have a machine learning model predict the probability of success in upselling a given product to a customer, while the same is browsing a website. In accordance to the model prediction, the website should be adapted dynamically to present adequate contents from most suited personalized marketing campaign. The technical requirements arising from this scenario are:
- Overall scoring speed: To guarantee a seamless experience for the user, the whole process should be finished in less than 10 ms.
- Scalability: In a realistic scenario, the system should be able to handle several thousand requests per minute.
- Model update: It is crucial to be able to update the underlying models without introducing downtimes in the process.
The solution is presented below with emphasis on those three components.
High level view and flow of the process
The figure shows an example high level architecture, developed for this use case along with the involved components.
The main software components are: Content Management System (CMS), Apache Kafka server as well as SAP Data Intelligence. Within SAP Data Intelligence two pipelines are present – model training and model apply pipeline.
The logical flow can be summarized as follows:
- The process is triggered by a customer opening the retailer app, which internally requests the homepage of the retailer.
- The CMS system, responsible for the webpage contents, forwards the customer ID (known through the app) along with additional context-related parameters (e.g. geolocation, etc.) to the ML application pipeline via a Kafka message.
- The apply pipeline consolidates all available data for this customer, e.g. by pulling it from remote systems like SAP Hybris Marketing, etc.
- The current model is applied on the enhanced analytical dataset to calculate the probability to upsell.
- Finally, the results are send back to the CMS system, which adapts the contents of the webpage (e.g. by adding a special offer, etc.) and sends it to the requesting app.
Overall scoring speed
All communication, i.e. between training and scoring pipelines, or bringing new data to the scoring pipeline were realized using Kafka messages.
Kafka messages are a widely used tool to stream event or sensor data in real time. The distribution of messages follows a publish-subscribe concept – involving producers, consumers and brokers. The Kafka broker (server) is the central entity for the messages, while producers push the records under specific topic and consumers pull the message on the receiving end.
REST-based APIs were also tested but didn’t provide all flexibility needed for the use case. The Kafka message approach showed very stable behavior and was capable to react to the amount of incoming traffic within the required time interval. The out-of-the-box Kafka consumer and Kafka producer operators in SAP Data Intelligence make it really easy to setup the communication. All information needed is the broker address as well as the message topic.
FROM $com.sap.sles.base RUN mkdir /tmp/JS/ COPY --chown=vflow:vflow amdefine.js /home/vflow/amdefine.js COPY --chown=vflow:vflow autoRuntime.js /home/vflow/autoRuntime.js COPY --chown=vflow:vflow dateCoder.js /home/vflow/dateCoder.js COPY --chown=vflow:vflow utils.js /home/vflow/utils.js
After the Docker image is build and tagged, the Tags need to be added to the group of the scoring operator
With those settings it is possible to host and execute the JS-based scoring code in a SAP Data Intelligence pipeline. This allows for the data and scoring to remain in the system, which minimizes data movement and decreases the overall execution time.
Another approach to tweak execution speed is the use of parallelization. The multiplicity parameter for a group of operators in SAP Data Intelligence instructs the system to execute parts of the pipeline in parallel. For this purpose the existing scoring pipeline was updated, adding a new group and a Python operator:
In the settings of the Scoring group it is possible to specify a multiplicity value as shown in the following figure:
Setting up a multiplicity value of 5 in the Scoring group will have the effect of SAP Data Intelligence creating 5 independent pods running in parallel for this part of the pipeline. Each of those pods will load the current model and wait for incoming data to be scored. This means that 5 model instances are ready to process new data points, which parallelizes the scoring task, thus accelerating the end-to-end flow. The whole creation, start and stop of parallel pods, as well as the distribution of data to available pods, is managed by SAP Data Intelligence for the user.
Model update is an important MLOps topic. On one hand, models need to be updated regularly at a frequency, which depends on several factors like the dynamics of the process being modeled, etc. On the other hand, applications of models in production shouldn’t be interrupted to update the version of the model. Additionally, for many use cases it is important for the process applying the model to be aware that a new model is available so that a reload can be scheduled.
The figure below shows a simplified version of a training and validation pipeline, which uses Kafka messages (e.g. via NewModel topic) to notify potential application pipelines that a new model has been trained and is available now in the repository.
The application pipeline listens to the NewModel topic and as soon as a message is received, this triggers a model loading process. Once the model is loaded from the repository, it becomes immediately available to be used to predict the upsell probability for the next customer request which reaches the system.
Summary and next steps
The approach described in this blog post demonstrated how SAP Data Intelligence can be used to deploy machine learning models, how to automatically update them without interruptions of the productive system and finally, how to enable real time scoring of the models. The communication between all parts of the solution is based on Kafka messages, which also allows to trigger a mass model scoring.
The concept consists of two main pipelines:
- A training pipeline for model training and validation, which is run on demand. This pipeline sends a notification that a new model has been trained using Kafka messages
If you have any specific ideas or use cases in mind, or just questions and comments, please feel free to reach out.
This blog post is based on a customer project done together with Andreas Forster and the contents will be also shown at Teched 2020 in the following session: Teched 2020: Application of Machine Learning Models in Real Time and at Scale [INT114]