Lessons Learnt from Operational Deployment of Automated Machine Learning
It has been a while now since we have integrated intelligent features deep in some SAP applications such as ‘Opportunity Scoring’ and ‘Lead Scoring’ in C/4 sales, or ‘forecast delays in Stock in Transit’ for S/4 Sourcing and Procurement, just to mention two out of dozens.
Now, we have live customers leveraging these smart processes, and thanks to C/4 professional services teams, we can begin to draw some experience from these first operational deployments of automated machine learning. This demonstrates truly the Intelligent Enterprise in motion and corresponds to the ‘optimize’ way of delivering SAP Leonardo innovations through deep embedding within the SAP applications.
So, in a nutshell, here are the findings:
- Machine Learning can’t be a black-box
- Clean data beats more data and more data beats better algorithms
- Model is never perfect but it can always be useful
- Machine Learning in operations is a way to measure the maturity of the underlying process implementation
We will dive into each of these aspects in the following blog.
Machine learning can’t be a black-box
Let’s face it, we’re still in the time when business deciders need to build their own trust into machine learning systems where the machines will take the responsibility of making the operational or tactical decisions. In our example, opportunity scoring, the main decision to be made is: “Which opportunity should my team focus on to transform them as deals/customers and what should be done next for each of them?”
Most sales manager will not trust at first sight a machine-generated score that will rank their opportunities. So, we need to help these sales managers to take the decision to use these systems in operations. We are entering what is called ‘Explainable Artificial Intelligence or Explainable AI.’ What can we do?
- First, we can use the findings of the modeling techniques such as the notion of ‘Key Influencers’ our ‘Outliers’ that will show to these deciders what are the main data elements that influence in general whether an opportunity becomes a customer or not. The idea underlying this process is the same then as‘Data Discovery,’ and therefore, soon, predictive and machine learning models should be accessed from SAP Analytics Cloud offering nice data exploration techniques to help build the trust about these findings. Data does not lie but it can introduce bias if not properly filled or used in the right context.
- Second, we can provide some explanations on how each individual score is computed. The same techniques that are used in credit scoring to explain to someone why he has been reused a credit line can be used to explain any score that is generated. In our previous example, we can pinpoint the two data elements that pushes the scores of an opportunity down (for example, this opportunity has been too long in a given phase), and this becomes a direct call to action for the sales representative (assuming he will not ‘trick’ the system and just change a field in the system without the proper underlying action…)
Why are we doing this? Because the training process of these automated machine learning techniques are purely automated, and the more insights you show in a way that can be consumed by a business user, the more chances you will find unexpected results or insights that will trigger the right questions on how this data is filled or used during the business process. When the business stake holders have asked all their questions, they are ready to go live, but they need help to ask the proper questions and therefore it is crucial to show them as many findings as possible that have been extracted from mathematics.
Clean data beats more data and more data beats better algorithms
Very often, when I talk about embedded machine learning automation, the first question that is asked is, “What algorithms are you using?” This is the wrong FIRST question. The reason is that, when we talk about structured data, machine learning automation is a solved problem: you may use the SAP proprietary IP coming from KXEN or the state of the art algorithms such as Gradient Boosting and both will get you very good results, with no need for external tuning. (Check out the results of the last Kaggle Challenges and you see that all leader boards are filled up with XGBoost.)
The next frontier is in feature engineering and in having wider datasets, which means more variables/columns/attributes to describe your customers, or your opportunities in the previous examples. Therefore, more data beats better algorithms today. Moreover, these automated algorithms can resist to specific phenomenon such as over fitting in wide datasets, so there is no drawback using them and adding more columns to a dataset to train predictive models. And then, it should be obvious that clean data is better than everything: data quality is the main driver for the quality of predictive accuracy.
Model is never perfect but it can always be useful
The last point about data quality should not be misinterpreted. I have heard some consultants saying, “You need data of perfect quality before trying to use predictive and machine learning,” and I don’t think I would agree with that. Of course, data of good quality will always lead to better predictive accuracy and robustness than poor quality data.
But you can use machine learning, especially robust techniques, even on poor quality data for two main reasons:
- If we take the previous example, ranking opportunities will always help your sales team to focus on more interesting leads than others having a guiding principle is always better that attacking at random, and often better than pure intuition. Data scientists compare predictive accuracy between different techniques and they are trying to shoot for the most accurate, but business users must focus on the impact in terms of business —usually better margins through better focus. Anything that is better than random will optimize your margin! Of course, this means that you use robust techniques and, as presented in the previous section, these robust techniques exist today in terms of algorithms.
- Because most of these techniques can lead you to notions such as ‘key influencers’ or ‘outliers,’ you know which variables most impact the outcome, even with low accuracy, so you know where to focus your quality requirements first—on the key influencers. Furthermore, the notion of outliers allows you to concentrate on the data elements that don’t seem right. Where else should you start your quality improvement journey?
Machine Learning in operations is a way to measure the maturity of the underlying process implementation
What we have seen is that the predictive performance of opportunity scoring is directly linked with the implementation of the sales process. In other words, the maturity of your organization implementing this process. This was especially true for opportunity scoring since a fair bit of data is coming from direct inputs from the sales representatives, which can be incomplete and noisy, and sometimes even voluntarily biased.
The predictive accuracy and robustness of the models trained on your data is a very good indicator that can be used to benchmark your organization with other organizations. This is something that can be provided as an extra service from the Intelligent Enterprise delivered as SaaS.
IN summary, that is it! We are in! The Intelligent Enterprise is in motion and we have live customers levering this! New avenues for findings on new frontiers ahead!
Learn more about what’s possible with predictive analytics today by reading our other machine learning blogs.