Environment

Fukuhara · ‎09-06-2022

I am writing this blog to define HANA tables and upload data to the table using python package hana_ml.

Environment

Environment is as below.

Python: 3.7.13(Google Colaboratory)

HANA: Cloud Edition 2022.16

Python packages and their versions.

hana_ml: 2.13.22072200

pandas: 1.3.5

scikit-learn: 1.0.2

As for HANA Cloud, I activated scriptserver and created my users. Though I don't recognize other special configurations, I may miss something since our HANA Cloud was created long time before.

I didn't use HDI here to make environment simple.

Python Script

1. Install Python packages

Install python package hana_ml, which is not pre-installed on Google Colaboratory.

As for pandas and scikit-learn, I used pre-installed ones.

!pip install hana_ml

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/

Collecting hana_ml

  Downloading hana_ml-2.13.22072200-py3-none-any.whl (5.1 MB)

     |████████████████████████████████| 5.1 MB 4.2 MB/s 

Collecting hdbcli==2.13.13

  Downloading hdbcli-2.13.13-cp34-abi3-manylinux1_x86_64.whl (11.6 MB)

     |████████████████████████████████| 11.6 MB 15.5 MB/s 

Requirement already satisfied: matplotlib in /usr/local/lib/python3.7/dist-packages (from hana_ml) (3.2.2)

Collecting Deprecated

  Downloading Deprecated-1.2.13-py2.py3-none-any.whl (9.6 kB)

Requirement already satisfied: pandas>=0.24.2 in /usr/local/lib/python3.7/dist-packages (from hana_ml) (1.3.5)

Requirement already satisfied: tqdm>=4.43.0 in /usr/local/lib/python3.7/dist-packages (from hana_ml) (4.64.0)

Requirement already satisfied: numpy>=1.16.4 in /usr/local/lib/python3.7/dist-packages (from hana_ml) (1.21.6)

Requirement already satisfied: pydot in /usr/local/lib/python3.7/dist-packages (from hana_ml) (1.3.0)

Collecting schedule

  Downloading schedule-1.1.0-py2.py3-none-any.whl (10 kB)

Collecting htmlmin>=0.1.12

  Downloading htmlmin-0.1.12.tar.gz (19 kB)

Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.7/dist-packages (from pandas>=0.24.2->hana_ml) (2.8.2)

Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.7/dist-packages (from pandas>=0.24.2->hana_ml) (2022.2.1)

Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.7.3->pandas>=0.24.2->hana_ml) (1.15.0)

Requirement already satisfied: wrapt<2,>=1.10 in /usr/local/lib/python3.7/dist-packages (from Deprecated->hana_ml) (1.14.1)

Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->hana_ml) (1.4.4)

Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->hana_ml) (3.0.9)

Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib->hana_ml) (0.11.0)

Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from kiwisolver>=1.0.1->matplotlib->hana_ml) (4.1.1)

Building wheels for collected packages: htmlmin

  Building wheel for htmlmin (setup.py) ... done

  Created wheel for htmlmin: filename=htmlmin-0.1.12-py3-none-any.whl size=27098 sha256=b0b9363455ff5e044042f13fa8c71be26012fbcb0c79f559aeaf857e0e4c861a

  Stored in directory: /root/.cache/pip/wheels/70/e1/52/5b14d250ba868768823940c3229e9950d201a26d0bd3ee8655

Successfully built htmlmin

Installing collected packages: schedule, htmlmin, hdbcli, Deprecated, hana-ml

Successfully installed Deprecated-1.2.13 hana-ml-2.13.22072200 hdbcli-2.13.13 htmlmin-0.1.12 schedule-1.1.0

2. Import modules

Import python package modules.

from hana_ml.dataframe import ConnectionContext, create_dataframe_from_pandas

import pandas as pd

from sklearn.datasets import make_classification

3. Connect to HANA Cloud

Connect to HANA Cloud and check its version.

ConnectionContext class is for connection to HANA.

HOST = '<HANA HOST NAME>'

SCHEMA = USER = '<USER NAME>'

PASS = '<PASSWORD>'

conn = ConnectionContext(address=HOST, port=443, user=USER,

                           password=PASS, schema=SCHEMA,

                           encrypt=True, sslValidateCertificate=False) 

print(conn.hana_version())

4.00.000.00.1660640318 (fa/CE2022.16)

4. Create test data

Create test data using scikit-learn.

There are 3 features and 1 target variable.

def make_df():

    X, y = make_classification(n_samples=1000, 

                               n_features=3, n_redundant=0)

    df = pd.DataFrame(X, columns=['X1', 'X2', 'X3'])

    df['CLASS'] = y

    return df



df = make_df()

print(df)

df.info()

Here is dataframe overview.

           X1        X2        X3  CLASS

0    0.964229  1.995667  0.244143      1

1   -1.358062 -0.254956  0.502890      0

2    1.732057  0.261251 -2.214177      1

3   -1.519878  1.023710 -0.262691      0

4    4.020262  1.381454 -1.582143      1

..        ...       ...       ...    ...

995 -0.247950  0.500666 -0.219276      1

996 -1.918810  0.183850 -1.448264      0

997 -0.605083 -0.491902  1.889303      0

998 -0.742692  0.265878 -0.792163      0

999  2.189423  0.742682 -2.075825      1



[1000 rows x 4 columns]

<class 'pandas.core.frame.DataFrame'>

RangeIndex: 1000 entries, 0 to 999

Data columns (total 4 columns):

 #   Column  Non-Null Count  Dtype  

---  ------  --------------  -----  

 0   X1      1000 non-null   float64

 1   X2      1000 non-null   float64

 2   X3      1000 non-null   float64

 3   CLASS   1000 non-null   int64  

dtypes: float64(3), int64(1)

memory usage: 31.4 KB

5. define table and upload data

Define HANA Table and upload data using function "create_dataframe_from_pandas".

The function is very useful, since it automatically define table and upload at the same time. Please check options for further detail.

TRAIN_TABLE = 'PAL_TRAIN'

dfh = create_dataframe_from_pandas(conn, df, TRAIN_TABLE,

                             schema=SCHEMA, 

                             force=True, # True: truncate and insert

                             replace=True) # True: Null is replaced by 0

6. Check upload result

check the result of HANA table definition and upload result. With HANA dataframe, python has connection to HANA table, so need to collect when getting data.

def show_hana_df(dfh):

    print(dfh.collect())

    print(f'Table Structure: {dfh.get_table_structure()}')

    print(dfh.describe().collect())



show_hana_df(dfh)

          X1        X2        X3  CLASS

0    0.964229  1.995667  0.244143      1

1   -1.358062 -0.254956  0.502890      0

2    1.732057  0.261251 -2.214177      1

3   -1.519878  1.023710 -0.262691      0

4    4.020262  1.381454 -1.582143      1

..        ...       ...       ...    ...

995 -0.247950  0.500666 -0.219276      1

996 -1.918810  0.183850 -1.448264      0

997 -0.605083 -0.491902  1.889303      0

998 -0.742692  0.265878 -0.792163      0

999  2.189423  0.742682 -2.075825      1



[1000 rows x 4 columns]

Table Structure: {'X1': 'DOUBLE', 'X2': 'DOUBLE', 'X3': 'DOUBLE', 'CLASS': 'INT'}

  column  count  unique  nulls      mean       std       min       max  \

0     X1   1000    1000      0 -0.022545  1.403956 -4.543441  4.020262   

1     X2   1000    1000      0  0.006131  0.987222 -3.019512  3.926238   

2     X3   1000    1000      0 -0.048433  1.322137 -3.836929  3.994644   

3  CLASS   1000       2      0  0.498000  0.500246  0.000000  1.000000   



     median  25_percent_cont  25_percent_disc  50_percent_cont  \

0 -0.197133        -1.054538        -1.056985        -0.197133   

1 -0.000251        -0.648000        -0.649373        -0.000251   

2 -0.224366        -1.021008        -1.021405        -0.224366   

3  0.000000         0.000000         0.000000         0.000000   



   50_percent_disc  75_percent_cont  75_percent_disc  

0        -0.197984         0.990549         0.990513  

1        -0.000709         0.666021         0.665924  

2        -0.229456         0.969732         0.967854  

3         0.000000         1.000000         1.000000

7. Close connection

Last but not least, closing connection explicitly is preferable.

conn.close()

Python hana_ml: define HANA table and upload(create_dataframe_from_pandas)

Environment

Python Script

1. Install Python packages

2. Import modules

3. Connect to HANA Cloud

4. Create test data

5. define table and upload data

6. Check upload result

7. Close connection

Get Your SAP HANA Idea Incubator Badge Today!

SCN Mission - SAP HANA Quiz Challenge is now retired

Share your #HANAStory and Win