Multi-Cloud Approach to SAP Leonardo Machine Learning Using Google Cloud Platform AI-Platform
Machine learning as a new, cool-looking emerging technology has never lacked exciting headlines and breakthroughs in recent years.
- In 2015, deep-learning model out-performed human in the ImageNet challenge, that classifies an image of an object into different categories.
- In 2016, AlphaGo defeated world Go Champion in a 5 game match.
- In 2018, a self-driving car hitting a pedestrian in Arizona…
Well the last one is not an exciting one entirely, but the rapid advancement in the field is quite apparent
In this post, I will explore how companies could easily approach machine learning with a multi-cloud approach
Machine Learning as an Enterprise Solution
Enterprise nowadays also sees the advancement in AI field as an opportunity and is looking to adopt machine learning as a viable enterprise solution. As evidenced by the Gartner’s 2019 CIO survey, “The number of enterprises implementing artificial intelligence (AI) grew 270 percent in the past four years and tripled in the past year”
When an enterprise tries to implement machine learning solution, the goal is different from that of academic research who tries to push the boundary of science. The enterprise usually tries to solve a specific business problem or improve its KPI in a certain area. The data that they work on is also quite specific to their business that requires domain-specific knowledge to understand what the data is actually telling you.
Let’s first look at how the enterprise approach machine learning projects
Data scientists are probably the first things that come to mind when we talk about machine learning. Their job usually focuses on finding a way to exploit a given dataset by applying various machine learning methods, and hopefully, the result will provide meaningful business insight to make decisions upon. The successfulness of such exploitation is then measured using statistical matrices such as accuracy, MSE (mean square error), F score, etc. Which works perfectly in describing how the machine learning model performs but doesn’t directly answer the question of how the business should react.
As for business analysts, they usually do not have a deep understanding of how machine learning works, and how to evaluate them, but they fully understand the business issues and what needs to be done to improve the situation.
Therefore it is very common to see machine learning project ran by a team that involves both data scientist and business people. One issue with this approach is that these two parties usually work with different tools and systems. Data scientists work with systems that have sufficient computing power that is able to process the complex calculation on a large dataset. The business side works on their mission-critical ERP system or database, where the stable operation is required so that it is not a good idea to run an expensive machine learning process on the same system. To integrate the systems, data engineers or system integrators are usually involved to build the data pipeline or system interfaces.
With the emergence and popularization of cloud services, the borderline that divides each role described above is starting to fade. Cloud provides a variety of services that are easy to implement. The services are also compatible with each other and easily integrated. For example, If your business data is stored in the same cloud where you also build your machine learning model, then the data scientist no longer needs to build a complex data pipeline to fetch and feed their models. Similarly, with the debut of cloud machine learning service, data engineers or the business analysts no longer need the deep knowledge of machine learning to build a fully functional model.
See this post for a comprehensive explanation of machine learning related roles
Cloud and Machine Learning Service
Machine learning service on the cloud can be divided into 3 categories:
- VM service, where the user set up their own machine learning environment including network, OS, and data storage. Then a machine learning model can be built on the VM instance
- Managed machine learning platform (PaaS), where the user build a training package on any environment of their choice, and only do the model training on the cloud platform
- Fully automated machine learning services (MLaaS), where the user only needs to provide input data for their model, and the rest will be handled by the cloud service
Majority of the cloud service providers such as AWS, MS Azure and Google Cloud Platform, provides all of the 3 services, with more focus on the 2nd and 3rd option to provide users an easy to use machine learning services.
Although the fully automated MLaaS sounds very tempting, the functionality it provides is not yet widely suitable for real-world applications. Taking Google, the forerunner in machine learning research and development as an example provides a fully automated machine learning service called AutoML. Currently, AutoML supports image, video, text, speech, tabular data, and allows you to train state of the art machine learning model with just a few clicks. However, it lacks the flexibility that allows you to customize the model based on your specific dataset and use case.
Consider a consumer product maker trying to build a mobile app that allows its customer to take a picture of the product with their phone in-store to retrieve more product-related information (such as this one). Such an app could utilize machine learning models, which classifies the product based on the picture taken by users. The picture of the product will look very different depending on what angle the picture was taken, the top-down picture of one product might be very different from the picture taken from sides, it might even look very similar to another product when looked at from an angle.
Humans are very adept in dealing with this problem as we know exactly how we are looking at an object by cognitive spatial mapping. eg. are we away or close to the object? are we looking down at it? or are we moving relative to it? Similar spatial map could be achieved if the machine learning model also uses the gyroscope data from the phone when the picture was taken, which should tell you if the phone camera was looking down or horizontal.
Other potentially useful metadata such as the location where the picture was taken, camera settings, etc in addition to the gyroscope data will be crucial input data for the machine learning model when considering a real-world application. But these metadata can be utilized only if you consider a complex model structure that combines picture data with scaler data (metadata) as input, which is unfortunately not yet available in the AutoML or other automated ML services.
A complex model structure can be achieved by opting for the 1st or 2nd option, and Google Cloud Platform, for example, provides AI-Platform that allows you to build your own model without worrying about the underlying infrastructure.
SAP Multi-Cloud Approach to Machine Learning
Similar to Google Cloud Platform, SAP Cloud Platform also provides a highly automated machine learning service in the SAP Leonardo Machine Learning Foundation (the 3rd option), which is called “training services”, and is compatible with image, text and speech data. Though it is in the roadmap, SAP currently does not provide services that let you build your own model (1st and 2nd option).
Continue exploring our app for the consumer product, SAP Cloud Platform is likely the suitable platform to build such an app if most of your business data, including the product-related information, is already stored in SAP Cloud Platform services such as S/4 or HANA DB, so that everything can be managed within the same platform. However, the lack of flexible model building capability makes it impossible to build and train an optimized machine learning models for the app.
By adopting a multi-cloud approach, you will be able to build and train the machine learning model on Google Cloud Platform, and then:
- Deploy the trained model on SAP Cloud Platform for the app to consume, or;
- Deploy the model on Google Cloud Platform and use service calls from SAP Cloud Platform to consume the model as a service.
Either approach will allow you to fully utilize the top-notch machine learning service provided by Google, and combine that with your existing business data or applications
It is also important to realize that building and using machine learning model in your application is not a one-way trip that goes from building a model to deploying and consuming it. The return trip that feeds the prediction result back to model building is just as important.
Let’s say that you successfully build your initial machine learning model that is capable of recognizing products with sufficient accuracy, it is guaranteed that your model will make mistakes in identifying the correct product as new pictures keep coming in. In order to maintain and continue to improve its accuracy, it is crucial that you feedback the new pictures, especially the ones that are easy to make mistakes and re-train the model using the newly seen dataset.
Also, your business is very likely to continue to release new products, and similarly, make improvements to the old products. Any changes in your product lineup will require you to re-train your model as well so that the model will be able to recognize the new products.
The takeaway message is that multi-cloud is not as simple as connecting two cloud environment, it is more of an integrated environment where both model and data can flow back and forth smoothly so that you can benefit from the best of both clouds.
In the following series of posts, I will explore a working example of how to integrate machine learning model build on Google Cloud Platform with SAP Leonardo, and also how to run inference on models deployed on Leonardo, run a retrain on GCP, and redeploy the retrained model on Leonardo automatically.