HANA AutoML library
Let’s assume you have to prepare machine learning model for classification or regression task.
All your data already in HANA, or in flat(csv) file.
Everything you need – https://github.com/dan0nchik/SAP-HANA-AutoML (This library is an open-source research project and is not part of any official SAP products.)
This is joke, but hana_automl goes through all(not yet) AutoML steps and makes Data Science work easier.
This library based on python and made on top of other awesome libs:
For installation – you need just
pip3 install Cython pip3 install hana_automl
After installation – it is quite easy to start:
from hana_automl.utils.scripts import setup_user from hana_ml.dataframe import ConnectionContext cc = ConnectionContext(address='address', user='user', password='password', port=39015) # replace with credentials of user that will be created or granted a role to run PAL. setup_user(connection_context=cc, username='user_new', password="password_new")
setup_user – is additional method if you need to create new user for experiments.
After that – you need fit/predict and waiting…
from hana_automl.automl import AutoML model = AutoML(cc) model.fit( file_path='path to training dataset', # it may be HANA table/view, or pandas DataFrame steps=10, # number of iterations target='target', # column to predict time_limit=120 # time limit in seconds )
model.predict( file_path='path to test dataset', id_column='ID', verbose=1 )
You can find all documentation here – https://sap-hana-automl.readthedocs.io/en/latest/index.html
Also, it is possible to run all this steps not from python, but from UI with help of streamlit
This UI looks like this:
To start Ui you need 3 steps:
- Clone repository:
git clone https://github.com/dan0nchik/SAP-HANA-AutoML.git
- Install dependencies:
pip3 install -r requirements.txt
- Run GUI:
streamlit run ./web.py
Ok, why you have to try?
Have a look on this example – https://github.com/dan0nchik/SAP-HANA-AutoML/blob/main/comparison_openml.ipynb
APL – is awesome, but with strong focus on speed, for more accurate models you need some time and PAL. So, hana_automl could help.
Also, it is possible to make not just simple model, but blending of models. To enable ensemble, just pass ensemble=True to
hana_automl.automl.AutoML.fit() function when creating AutoML model.
There is a big potential for improvement and contribution is very welcome!
If you have any ideas – https://github.com/dan0nchik/SAP-HANA-AutoML/issues
P.S. this is project of @While-true-codeanything and @dan0nchik – very talented students…
Don’t wait – have a try on your dataset and share your results…
I trained my first hana_autml regression on PAL 🙂
Big kudos for putting such a cool project together Dmitry Buslov
This is really great Dmitry, makes using PAL much easier in Python!