MLOps (from Machine Learning and Operations) refers to the process of managing the production lifecycle of Machine Learning models, including also the concept of collaboration between data scientists, data engineers and IT professionals. The objective is to define recommendations and best practices to automate the process, comply with regulatory requirements as well as provide agility to react to changing business requirements. Even though this procedure is mainly of technical nature, companies where MLOps practices are not implemented efficiently face also a number of business and financial challenges.
In this blog post I would like to describe the findings and challenges due to inefficient MLOps we have encountered in several customer engagements and to describe how those challenges can be addressed with SAP Data Intelligence, hoping to provide guidance for others in similar situations.
1. Machine learning operations challenges
The main challenges, when it comes to developing and bringing machine learning models into production and monetizing on the extracted insights, as well as their business impact can be summarized in the following points:
CH1: Collaboration within the Data Science (Analytics) team
It is a common practice in data science teams to develop on local machines and distribute the developed models via shared drives or even email. This creates lack of operational efficiency and can lead to silos and collaboration barriers, delaying the process of model creation. Such delays eventually lead to missed financial opportunities, especially when the models created are time dependent and become quickly outdated. For example, if a team is working on customer analytics to target the right candidates for a new product offer as a reaction to competitor activity, a delay in the model creation would result in customers not being engaged in a timely manner, leading to their potential churning to the competition. This is a good example of how operations in the team are linked directly to financial loss.
CH2: Model consumption across the organization
Once the analytical models are created, they need to be easily consumable / sharable across the organization. A bottleneck in the model accessibility would introduce a delay in the value generation out of the analytical insights. This again is directly linked to various business and financial issues.
CH3: Model lifecycle management
Inability to provide adequate model deployment, update and management environment creates financial risks for the organization due to outdated and in worst case wrong models. On top of the business risks, such models could also lead to damage for the company image and standing.
CH4: Performance monitoring
Inability to monitor model performance and act upon critical alerts in timely manner can lead to substantial financial loss.
CH5: Access restrictions and auditing logs
Inadequate user and access management creates lack of transparency and version control, which are critical features especially for regulatory and auditing purposes.
CH6: Privacy-related concerns with business data
Even though this point is not directly related to MLOps, it is relevant for the end-to-end storyline. Extracting business data from productive systems and distributing it across the organization poses a privacy-related risk as well as a business risk due to the sensitive and critical nature of such data.
In the next section I will review how standard capabilities of SAP Data Intelligence can be used to address those challenges.
2. Standard MLOps capabilities of SAP Data Intelligence
Figure 1: Pseudo flow graph of an end-to-end machine learning process
Figure 1 represents the high level process of bringing machine learning models into production and managing their lifecycle. SAP Data Intelligence currently offers out-of-the-box functionality for each of those points. There are several articles online describing the features in much more detail, e.g. SAP Data Intelligence blog series or Nidhi’s blog post, so I will be brief on the standard capabilities and how they relate to the challenges defined above:
- With its prepared connectors to standard systems allows SAP Data Intelligence to access various data sources and manage those remote systems in a central place. (CH1: Collaboration within Data Science Teams, CH6: Privacy-related concerns with business data)
- Using the concept of ML scenarios SAP Data Intelligence unites several artifacts belonging together into a virtual container – scenario. The scenario holds ML experiment data, documentations, experimental runs, model KPIs and API deployments in one place. Changes of artifacts are tracked and there is a user-based access implemented. (CH1: Collaboration within Data Science Teams, CH2: Model consumption across organization, CH3: Model lifecycle management, CH5: Access restriction and auditing logs)
- The scenario manager is an application in SAP Data Intelligence, allowing the automation of model training and model deployment via templated pipelines, aiming to allow seamless transition from prototyping phase (e.g. Jupyter Notebook) into production. (CH2: Model consumption across organization)
- Using the integrated data lake it is possible to store models into an internal repository. (CH3: Model lifecycle management)
- There is flexibility on which model KPIs are monitored and reported in the scenario manager. (CH4: Performance monitoring)
- If SAP HANA is used as the data source, it is possible to apply the models in the database layer, thus avoiding the unnecessary movement of critical business data. The Python interface to HANA Sqlscript, called hana_ml, allows to develop directly in Python. (CH1: Collaboration within Data Science Teams, CH6: Privacy-related concerns with business data)
- Retraining is supported in a semi-automatic manner and will be discussed in the next section. (CH3: Model lifecycle management)
As it can be seen from this list, there is a big overlap between the features and challenges addressed above.
SAP Data Intelligence allows extension of its basic functionalities. In the following section I will discuss two such extensions: how to implement automated retraining (Point 9 in Figure 1) and the creation of a performance monitoring dashboard with SAP Analytics Cloud.
3. Extending the MLOps capabilities of SAP Data Intelligence
Via the pipelining concept of SAP Data Intelligence it is possible to implement the training process into a pipeline and trigger the execution of this pipeline based on different rules or external factors. The following retraining approaches are most commonly used:
Figure 2: Schedule-based retraining is possible out-of-the-box in SAP Data Intelligence
In the schedule-based approach, the retraining happens on a regular basis, e.g. once per month, etc. It does not involve any monitoring of the model performance or data distribution, but can be used when certain conditions are met (e.g. when a repeatable process is being modeled and the properties of the data are changing on a predefined interval).
Figure 3: Data-based retraining
Oftentimes the properties of data evolve with time, potentially introducing previously unseen variety and even new categories. This concept is called data drift. On the other side, also the meaning of labels can change with time, i.e. our interpretation of the data changes even while the general distribution of the data does not. This is known as the concept drift. Depending on the model type and model performance as well as on the underlying processes, these two drift types can be uncritical or very harmful for the overall model performance. Observing only the data distribution and statistical properties is a proxy of the underlying process and can be misleading, so this method is recommended only if direct monitoring of the model performance is challenging and slow (e.g. involving simulation steps, etc.).
The main idea in data based retraining is to extract statistical descriptors of the data used for training and store those for reference during the apply phase. When the model is applied, the same statistical parameters are extracted from the new data and compared with the reference properties. If those differ significantly a new retraining cycle needs to be triggered.
As an example, a handy way to create he reference statistics can be the use of the univariate_analysis function from SAP HANA PAL (Predictive Analysis Library) as follows:
from hana_ml.algorithms.pal import stats statsContinous, statsCategorical = stats.univariate_analysis(conn,data=input_table,key='NEW_ID',cols=relevant_cols)
The process of data loading and training in pipeline format can be implemented as shown exemplary here:
Figure 4: The data loading and training pipelines example
As explained in Figure 3 during the new data loading into the system, the statistical properties are extracted and validated with the reference stat properties stored during training. If during validation it is determined that those differ, a new training is triggered (for simplicity some of the operators and data model information are not shown).
Figure 5: Model-metric-based retraining
The approach here is to apply the model and store the predictions. As soon as the actual values for the predicted parameter are available, they can be compared with the predictions and a metric for the model performance can be derived. This retraining method is the most reliable because it directly measures the model performance on live data. The deterioration of the model performance is sometimes referred to as model drift. This method has one disadvantage if the predictive horizon is long. In this case, until the time point when the first validation happens, the model would have run for the time duration of the predictive horizon (e.g. if predicting the deposit levels in a checking account one month in advance, a first validation can be performed in one month, once the actual deposit levels for the checking account become available. From that point on, it is a rolling process).
Advanced performance monitoring
Finally, when it comes to performance monitoring, it is possible to create customer dashboards with SAP Analytics Cloud. Here is an example of such dashboard aiming to provide a tool for monitoring of the operations of pipelines in a visual way;
Figure 6: Example dash board in SAP Analytics Cloud, showing the pipeline operations statistics
With this blog post I hope to have demonstrated a somewhat different look on SAP Data Intelligence from the MLOps perspective. This topic is oftentimes underestimated by institutions because it is considered rather technical and not directly related to business outcome. The experience shows that by implementing operational efficiency customers can avoid serious financial and regulatory issues, thus directly improving their business outcomes. And this can be done in an end-to-end manner using SAP Data Intelligence out-of-the-box functionalities as well as the extensibility of the system.
In a next blog post I will discuss in more details each of the extension capabilities as well as the topic on continuous integration / continuous deployment (CI/CD).
So that is all, I hope the concepts and thoughts discussed here provide some guidance especially on how to use SAP Data Intelligence as an MLOps platform.
A side note: SAP Data Intelligence is a quickly evolving product, constantly adding new features, so please be sure to scan the blog post updates and release notes for news.