I am writing this blog to show data visualization using python package hana_ml. DatasetReportBuilder is a EDA function and show report as below. I used famous titanic dataset, which is easy to understand.
!pip install hana_ml
from hana_ml.dataframe import ConnectionContext, create_dataframe_from_pandas
from hana_ml.visualizers.dataset_report import DatasetReportBuilder
import pandas as pd
import seaborn as sns
HOST = '<HANA HOST NAME>'
SCHEMA = USER = '<USER NAME>'
PASS = '<PASSWORD>'
conn = ConnectionContext(address=HOST, port=443, user=USER,
password=PASS, schema=SCHEMA,
encrypt=True, sslValidateCertificate=False)
print(conn.hana_version())
4.00.000.00.1660640318 (fa/CE2022.16)
df = sns.load_dataset('titanic')
df.rename({'xx': 'gender'}, axis=1, inplace=True)
print(df)
df.info()
survived pclass gender age sibsp parch fare embarked class \
0 0 3 male 22.0 1 0 7.2500 S Third
1 1 1 female 38.0 1 0 71.2833 C First
2 1 3 female 26.0 0 0 7.9250 S Third
3 1 1 female 35.0 1 0 53.1000 S First
4 0 3 male 35.0 0 0 8.0500 S Third
.. ... ... ... ... ... ... ... ... ...
886 0 2 male 27.0 0 0 13.0000 S Second
887 1 1 female 19.0 0 0 30.0000 S First
888 0 3 female NaN 1 2 23.4500 S Third
889 1 1 male 26.0 0 0 30.0000 C First
890 0 3 male 32.0 0 0 7.7500 Q Third
who adult_male deck embark_town alive alone
0 man True NaN Southampton no False
1 woman False C Cherbourg yes False
2 woman False NaN Southampton yes True
3 woman False C Southampton yes False
4 man True NaN Southampton no True
.. ... ... ... ... ... ...
886 man True NaN Southampton no True
887 woman False B Southampton yes True
888 woman False NaN Southampton no False
889 man True C Cherbourg yes True
890 man True NaN Queenstown no True
[891 rows x 15 columns]
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 survived 891 non-null int64
1 pclass 891 non-null int64
2 gender 891 non-null object
3 age 714 non-null float64
4 sibsp 891 non-null int64
5 parch 891 non-null int64
6 fare 891 non-null float64
7 embarked 889 non-null object
8 class 891 non-null category
9 who 891 non-null object
10 adult_male 891 non-null bool
11 deck 203 non-null category
12 embark_town 889 non-null object
13 alive 891 non-null object
14 alone 891 non-null bool
dtypes: bool(2), category(2), float64(2), int64(4), object(5)
memory usage: 80.7+ KB
TRAIN_TABLE = 'PAL_TRAIN'
dfh = create_dataframe_from_pandas(conn, df, TRAIN_TABLE,
schema=SCHEMA,
force=True, # True: truncate and insert
replace=True) # True: Null is replaced by 0
print(f'Table Structure: {dfh.get_table_structure()}')
Table Structure: {'survived': 'INT', 'pclass': 'INT', 'gender': 'NVARCHAR(5000)',
'age': 'DOUBLE', 'sibsp': 'INT', 'parch': 'INT', 'fare': 'DOUBLE',
'embarked': 'NVARCHAR(5000)', 'class': 'NVARCHAR(5000)', 'who': 'NVARCHAR(5000)',
'adult_male': 'NVARCHAR(5000)', 'deck': 'NVARCHAR(5000)',
'embark_town': 'NVARCHAR(5000)', 'alive': 'NVARCHAR(5000)', 'alone': 'NVARCHAR(5000)'}
datasetReportBuilder = DatasetReportBuilder()
datasetReportBuilder.build(dfh.add_id(), key="ID")
datasetReportBuilder.generate_notebook_iframe_report()
datasetReportBuilder.generate_html_report('titanic')
conn.close()
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
17 | |
15 | |
13 | |
10 | |
9 | |
7 | |
7 | |
7 | |
7 | |
6 |