TL;DR Data Attribute Recommendation uses dynamically created Deep Learning models (built via TensorFlow and trained on GPU)
Whether you are considering the usage of the Data Attribute Recommendation service in production, going to compare it with other ML solutions or you are just a “deep dive” person – at some point of time, you will be interested in how Data Attribute Recommendation does predictions. In this blog post, I will give more technical details and answer the most common questions.
I assume that you are familiar with basic concepts of machine learning, if not please have a look at the links at the end of this blog post for further information.
Data Attribute Recommendation solves multi-label classification problems, which is a subset of supervised machine learning. Supervised means that training data with labels (=correct values) should be provided first in order to train a model. By multi-label, I mean that one model may be trained to predict multiple classes at once independently (e.g. product type, production plant, and danger flag).
Before uploading data to Data Attribute Recommendation, the user must specify its structure, in other words, name all the features and labels along with their types. This is called a data set schema.
Data Attribute Recommendation accepts data in UTF-8 encoded CSV format, with a semi-colon ; delimiter. In order to decrease the size of the transmitted data CSV file may be compressed with gzip tool. gz extension to be added in this case
Before speaking about machine learning models, it makes sense to introduce another notion: model templates. In a nutshell, a model template is a set of data preprocessing steps and logic that creates a model. Those templates are a result of multiple co-innovation projects with customers from different industries and with different use cases.
So far Data Attribute Recommendation has two built-in model templates and both use artificial neural nets aka deep learning architectures that are feed-forward networks. This is the link to the current list of templates. In the future, additional templates with other approaches (RNN/CNN/Ensembles) may be added to the service upon need.
The second part of a model template consists of model building logic that takes a data set schema as an input and generates an empty (not trained model) for it. The consequence is that for every dataset a unique model architecture is created.
Machine learning model
Let’s imagine you’ve uploaded your data along with a data set schema and it’s successfully validated. Also, you’ve chosen a model template that fits best to your needs. Now you are ready to train a model – that’s just another API call. When you do this, Data Attribute Recommendation requests a GPU (CPU if you are on trial) container from the underlying platform and trains a model with your data. Here we also have some points to mention:
- First, TensorFlow is used as a framework for training and prediction.
- Second, before starting the training, the service performs a stratified split based on label(s) values and divides the data into 3 parts: training, validation and test data splits in the proportion 80/10/10. After the model is trained, you are able to see metrics like accuracy or F1-score received on test part of data via GET-type API call on the model
It’s important to understand that the final model may be presented as a combination of training data, dataset schema, and model template.
Initially, this post aimed to give a brief introduction into the machine learning part of Data Attribute Recommendation, but it’s a challenge to isolate only this area when speaking about comprehensive end-to-end implementation.
I hope this is useful to you. It would be great to receive your feedback/questions in the comments. If you’d like to read more on Data Attribute Recommendation, and be notified about new posts, follow Data Attribute Recommendation.
- https://help.sap.com/dar – Data Attribute Recommendation documentation
- https://developers.sap.com/mission.cp-aibus-data-attribute.html – Data Attribute Recommendation tutorial mission
- https://platformx-d8bd51250.dispatcher.us2.hana.ondemand.com/protected/index.html#/serviceCatalog/55681385-43a6-4d92-8706-6340a74a8f2d – SAP Cloud Platform Discovery Center
- https://help.sap.com/viewer/204887c82baf42cc929c3d8b2286b44e – AI & ML Glossary