Skip to Content
Technical Articles
Author's profile photo Dmitry Buslov

HANA AutoML library

Let’s assume you have to prepare machine learning model for classification or regression task.
All your data already in HANA, or in flat(csv) file.
Everything you need – https://github.com/dan0nchik/SAP-HANA-AutoML (This library is an open-source research project and is not part of any official SAP products.)

This is joke, but hana_automl goes through all(not yet) AutoML steps and makes Data Science work easier.

This library based on python and made on top of other awesome libs:

  • hana_ml
  • Optuna
  • BayesianOptimization
  • Streamlit

For installation – you need just

pip3 install Cython
pip3 install hana_automl

After installation – it is quite easy to start:

from hana_automl.utils.scripts import setup_user
from hana_ml.dataframe import ConnectionContext

cc = ConnectionContext(address='address', user='user', password='password', port=39015)

# replace with credentials of user that will be created or granted a role to run PAL.
setup_user(connection_context=cc, username='user_new', password="password_new")

setup_user – is additional method if you need to create new user for experiments.

After that – you need fit/predict and waiting…

from hana_automl.automl import AutoML

model = AutoML(cc)
model.fit(
  file_path='path to training dataset', # it may be HANA table/view, or pandas DataFrame
  steps=10, # number of iterations
  target='target', # column to predict
  time_limit=120 # time limit in seconds
)

predict:

model.predict( file_path='path to test dataset', id_column='ID', verbose=1 )

You can find all documentation here – https://sap-hana-automl.readthedocs.io/en/latest/index.html

Also, it is possible to run all this steps not from python, but from UI with help of streamlit

This UI looks like this:  Streamlit client

To start Ui you need 3 steps:

  1. Clone repository: git clone https://github.com/dan0nchik/SAP-HANA-AutoML.git
  2. Install dependencies: pip3 install -r requirements.txt
  3. Run GUI: streamlit run ./web.py

Ok, why you have to try?

Have a look on this example – https://github.com/dan0nchik/SAP-HANA-AutoML/blob/main/comparison_openml.ipynb

APL – is awesome, but with strong focus on speed, for more accurate models you need some time and PAL. So, hana_automl could help.

Also, it is possible to make not just simple model, but blending of models. To enable ensemble, just pass ensemble=True to hana_automl.automl.AutoML.fit() function when creating AutoML model.

There is a big potential for improvement and contribution is very welcome!

If you have any ideas – https://github.com/dan0nchik/SAP-HANA-AutoML/issues

P.S. this is project of  @While-true-codeanything and @dan0nchik – very talented students…

Don’t wait – have a try on your dataset and share your results…

Assigned Tags

      2 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Andreas Forster
      Andreas Forster

      I trained my first hana_autml regression on PAL 🙂
      Big kudos for putting such a cool project together Dmitry Buslov

      Author's profile photo Jeremy Yu
      Jeremy Yu

      This is really great Dmitry, makes using PAL much easier in Python!