How does AutoML work in Data Attribute Recommendation?
In this blogpost, I will share some insights regarding the new AutoML template in Data Attribute Recommendation (DAR). In case you haven’t read the great blogpost about DAR, you can quickly have a look at Machine Learning inside Data Attribute Recommendation,
What is AutoML?
With its recent release, the DAR service expands its capabilities with a new AutoML template. With AutoML,Citizen Data Scientists, business users and LOB users can create traditional ML models without machine learning knowledge and get answers to their business questions through machine learning models. This approach gives them more time to focus on business tasks and lowers the threshold to utilize machine learning models in production. Furthermore, AutoML can be used to train machine learning models in an unattended mode integrated into solutions or serve as baseline model for machine learning experts.
How to use AutoML via DAR?
AutoML automates many steps in the machine learning process, in particular data preparation, feature selection, algorithm selection and parameter optimization. The user only provides an input dataset and dataset schema information like for the existing templates in DAR. AutoML performs all steps required to process the data for training, builds a number of machine learning models and, in the end, returns the best performing model which can be easily deployed in the cloud. To use AutoML it is enough to select the AutoML template id in DAR API.
How does AutoML find the best model?
AutoML includes a set of preprocessing and ML algorithms each with a multitude of hyper-parameters. Within a pre-defined time budget, AutoML runs a number of experiments, each of which has a different set of algorithms and hyper-parameters to train.
After this experimentation phase, AutoML selects the best performing preprocessing and ML model and returns it as an artifact that can be deployed similar to the other DAR templates.
It is important to note that AutoML does not run every possible configuration to find the best model as this would take far too much time. In the backend, Instead, it uses a Bayesian Optimization algorithm to explore the certain places in the hyper-parameter search space and slowly then converging converges to an to the optimum. AutoML is of course smart enough to avoid getting stuck in local optima and also includes an early-stopping mechanism to avoid unnecessary iterations if multiple subsequent experiments do not show better performance on simple tasks to prevent converging to a local optimum, AutoML combines random search with the Bayesian Optimization.
What algorithms are included in AutoML?
Currently, AutoML supports many classification algorithms that are well suited for a broad range of tasks. Further releases are planned to support more capabilities such as regression.
Here is a quick summary list of the algorithms, that which are currently in the portfolio of AutoML:
Of course, not every preprocessing algorithm is activated for any given dataset as AutoML automatically adapts the preprocessing pipeline to each dataset. For example, the text preprocessing steps are only active when there is a text column or feature selection is disabled when there are only a few columns.
Hope you liked this blogpost about the AutoML template. If you’d like to read more on Data Attribute Recommendation or AutoML, and be notified about new posts, follow Data Attribute Recommendation.