This post is complementary to the session INT200 — Do-It-Yourself Machine Learning (click to watch the video 📺) from SAP TechEd 2020. You are more than welcome to review that recorded session and watch demos. But it is not a prerequisite for following this post.
Are you a developer hearing a lot about,
but not familiar with the Machine Learning hype?
Would you like to infuse that “intelligence”
everyone is talking about,
but are lost where to start?
You learn faster when coding it yourself,
than by listening to others?
I decided to present that SAP TechEd INT200 session and to publish this post to try to answer these questions. Please let me know in the comments if I succeeded or failed and what other unanswered questions you still have.
You won’t find a complete learning course here. The session was aimed to introduce base concepts still being light on slides and heavy on hands-on demos. So, this post contains mostly links to content, tools and code you can leverage for free to practice!
What is Machine Learning?
Well, there are as many answers to this question as people you ask.
On one hand, you have people who need to talk the talk with some new hype every year, and you can find they might be referencing the same things that used to be called predictive analytics, data mining, or even simply statistics…
On the other hand, there are people — no matter called statisticians back then or data scientists now — who have walked the walk for years. It is just that now the most advanced algorithms are easier to access plus the hardware advancement allowed to run them on big data in real-time.
As consumers, we are all interacting with ML every day. We are used to it and do not think about it, but something as ubiquitous as framing your face and focusing on it while taking a selfie is possible thanks to a deep learning algorithm running on the high-resolution dynamic picture thanks to the powerful hardware packed into a slim smartphone!
In the context of Enterprise, Machine Learning is primarily focusing on discovering new insights from the business data. It can be fraudulent cases, shipment predictions, or product recommendations.
The ML model is the central piece in Machine Learning. In very simple terms it takes data on the input and gives a result data on the output, like:
- getting a matrix of numbers (encoded pixels of a picture) to produce the coordinates of the bounding box (that frames one’s face on the picture), or
- getting a time series of the historical sales numbers to produce a prediction of future sales numbers based on discovered trend patterns in history.
These ML models can be build using a number of different algorithms representing different statistics/mathematical calculations, like
which is used in one of the clustering algorithms.
Generally speaking, an ML model receives input data, called features, to calculate predicted values, called labels.
Good news: you do not need to code all these by yourself. If your data is stored in SAP HANA, then it has a built-in…
Predictive Analysis Library (PAL)
PAL includes classic and universal predictive analysis algorithms in the most popular categories (please see Top Data Science and Machine Learning Methods Used in 2019 at KDnuggets), like:
- Time Series
- Social Network Analysis
- Recommender System
The Predictive Analysis Library defines functions that can be called from within SQLScript procedures or via Python or R interfaces to perform advanced analytic algorithms. You can find many code examples to try at:
And you can practice these on a free SAP HANA, express edition.
But you can see that the quality of the trained model highly depends on the selected algorithm and its parameters, which you need to tune. This example from scikit-learn shows how datasets can be split into different clusters depending on the method and parameters chosen.
Sometimes it takes time to train multiple models with different algorithms and different parameters to find which one produces the best results.
To reduce the time required for that SAP comes with…
Automated Predictive Library (APL)
APL, as the name suggests, automates several steps of creating and training models to obtain the best model for analyzing your new data.
The sample code is available for you to try out:
You need to install APL in your SAP HANA, express edition if you want to try it out.
The entire ML process
… does not end when the model is created. Now it should be made productized for the use, e.g. in the form of an API for developers to call from their applications, or in the form of an SQLScript procedure or Python object for analysts to use in their notebooks, like JupyterLab.
A model quality has to be monitored continuously, as it can degrade with time. The business environment is changing and a model trained on the data from past years might not well support the current situation. Tools like SAP Data Intelligence can be used to support this entire Machine Learning process.
The way to deploy and to work with a free trial version of SAP Data Intelligence can be found here: https://developers.sap.com/topics/data-intelligence.html.
As we said the process of choosing, training, and tuning the ML model can be tedious, while the resulting model is easy to use even across many use cases. To help customers SAP came with a continually increasing number of ready-to-use models supporting common business processes called…
SAP AI Business Services
These services provide pre-trained models or pre-configured neural networks to digitize activities like
- Document Information Extraction
- Data Attribute Recommendation
- Business Entity Recognition
and more. You can enable these services in the SAP Cloud Platform and then call using their APIs or provided SDKs.
To try them out in the trial, please check tutorials available at:
It is common to believe that ML models are some kind of super-natural black boxes that do some magic out of the air. As we discussed at the beginning there is some math or some statistics behind each ML model and an algorithm used to build it.
The issue with ML “black boxes” became even more important recently — both in society (to avoid infamous bias in AI) and in the business (to build confidence in decisions generated by AI).
A famous example was demonstrated in the paper “Why Should I Trust You?“, where authors showed an example of the Husky vs Wolf classifier, which happened to get trained to recognize… snow on the pictures to label them as “wolves”!
Explainability in AI
… is the area that focuses on addressing this issue. Our colleagues from SAP AI Applications have addressed this issue by creating a Python package Contextual AI.
Earlier this year they released it as an open-source:
and you are welcome to contribute!
It was difficult to get all information into a single presentation or post. Please let me know in the comments what else you’d like to know to start your developer’s journey into the world of the Intelligent Enterprise.
Stay healthy ❤️ and stay curious, everyone,
-Vitaliy (aka @Sygyzmundovych)