Skip to Content
Technical Articles

Machine Learning inside Data Attribute Recommendation

TL;DR Data Attribute Recommendation uses dynamically created Deep Learning models (built via TensorFlow and trained on GPU)

Introduction

Whether you are considering the usage of the Data Attribute Recommendation service in production, going to compare it with other ML solutions or you are just a “deep dive” person – at some point of time, you will be interested in how Data Attribute Recommendation does predictions. In this blog post, I will give more technical details and answer the most common questions.

I assume that you are familiar with basic concepts of machine learning, if not please have a look at the links at the end of this blog post for further information.

 

Task

Data Attribute Recommendation solves multi-label classification problems, which is a subset of supervised machine learning. Supervised means that training data with labels (=correct values) should be provided first in order to train a model. By multi-label, I mean that one model may be trained to predict multiple classes at once independently (e.g. product type, production plant, and danger flag).

 

Features/Labels

Before uploading data to Data Attribute Recommendation, the user must specify its structure, in other words, name all the features and labels along with their types. This is called a data set schema.

Example

Dataset

Data Attribute Recommendation accepts data in UTF-8 encoded CSV format, with a semi-colon ; delimiter. In order to decrease the size of the transmitted data CSV file may be compressed with gzip tool. gz extension to be added in this case

Model templates

Before speaking about machine learning models, it makes sense to introduce another notion: model templates. In a nutshell, a model template is a set of data preprocessing steps and logic that creates a model. Those templates are a result of multiple co-innovation projects with customers from different industries and with different use cases.

Model%20template

Model template consists of preprocessing and model builder logic

So far Data Attribute Recommendation has two built-in model templates and both use artificial neural nets aka deep learning architectures that are feed-forward networks. This is the link to the current list of templates. In the future, additional templates with other approaches (RNN/CNN/Ensembles) may be added to the service upon need.

The second part of a model template consists of model building logic that takes a data set schema as an input and generates an empty (not trained model) for it. The consequence is that for every dataset a unique model architecture is created.

Machine learning model

Let’s imagine you’ve uploaded your data along with a data set schema and it’s successfully validated. Also, you’ve chosen a model template that fits best to your needs. Now you are ready to train a model – that’s just another API call.  When you do this, Data Attribute Recommendation requests a GPU (CPU if you are on trial) container from the underlying platform and trains a model with your data. Here we also have some points to mention:

  • First, TensorFlow is used as a framework for training and prediction.
  • Second, before starting the training, the service performs a stratified split based on label(s) values and divides the data into 3 parts: training, validation and test data splits in the proportion 80/10/10. After the model is trained, you are able to see metrics like accuracy or F1-score received on test part of data via GET-type API call on the model

It’s important to understand that the final model may be presented as a combination of training data, dataset schema, and model template.

Final words

Initially, this post aimed to give a brief introduction into the machine learning part of Data Attribute Recommendation, but it’s a challenge to isolate only this area when speaking about comprehensive end-to-end implementation.

 

I hope this is useful to you. It would be great to receive your feedback/questions in the comments. If you’d like to read more on Data Attribute Recommendation, and be notified about new posts, follow Data Attribute Recommendation.

 

Useful links:

Be the first to leave a comment
You must be Logged on to comment or reply to a post.