Technical Articles
HANA AutoML library
Let’s assume you have to prepare machine learning model for classification or regression task.
All your data already in HANA, or in flat(csv) file.
Everything you need – https://github.com/dan0nchik/SAP-HANA-AutoML (This library is an open-source research project and is not part of any official SAP products.)
This is joke, but hana_automl goes through all(not yet) AutoML steps and makes Data Science work easier.
This library based on python and made on top of other awesome libs:
- hana_ml
- Optuna
- BayesianOptimization
- Streamlit
For installation – you need just
pip3 install Cython
pip3 install hana_automl
After installation – it is quite easy to start:
from hana_automl.utils.scripts import setup_user
from hana_ml.dataframe import ConnectionContext
cc = ConnectionContext(address='address', user='user', password='password', port=39015)
# replace with credentials of user that will be created or granted a role to run PAL.
setup_user(connection_context=cc, username='user_new', password="password_new")
setup_user – is additional method if you need to create new user for experiments.
After that – you need fit/predict and waiting…
from hana_automl.automl import AutoML
model = AutoML(cc)
model.fit(
file_path='path to training dataset', # it may be HANA table/view, or pandas DataFrame
steps=10, # number of iterations
target='target', # column to predict
time_limit=120 # time limit in seconds
)
predict:
model.predict( file_path='path to test dataset', id_column='ID', verbose=1 )
You can find all documentation here – https://sap-hana-automl.readthedocs.io/en/latest/index.html
Also, it is possible to run all this steps not from python, but from UI with help of streamlit
This UI looks like this:
To start Ui you need 3 steps:
- Clone repository:
git clone https://github.com/dan0nchik/SAP-HANA-AutoML.git
- Install dependencies:
pip3 install -r requirements.txt
- Run GUI:
streamlit run ./web.py
Ok, why you have to try?
Have a look on this example – https://github.com/dan0nchik/SAP-HANA-AutoML/blob/main/comparison_openml.ipynb
APL – is awesome, but with strong focus on speed, for more accurate models you need some time and PAL. So, hana_automl could help.
Also, it is possible to make not just simple model, but blending of models. To enable ensemble, just pass ensemble=True to hana_automl.automl.AutoML.fit()
function when creating AutoML model.
There is a big potential for improvement and contribution is very welcome!
If you have any ideas – https://github.com/dan0nchik/SAP-HANA-AutoML/issues
P.S. this is project of @While-true-codeanything and @dan0nchik – very talented students…
Don’t wait – have a try on your dataset and share your results…
I trained my first hana_autml regression on PAL 🙂
Big kudos for putting such a cool project together Dmitry Buslov
This is really great Dmitry, makes using PAL much easier in Python!