hana_ml
2.6 in the context of my demo used in this year's SAP TechEd's DAT108 session.hana_ml
.hana_ml
:SELECT
statement backing the dataframe,collect()
method,hana_ml
you should have some basic understanding of the Pandas module.hana_ml
if neededhana_ml
2.6 has been released since my previous post was published last week. I can see it using:pip search hana
docker exec hmlsandbox01 pip search hana
!pip search hana
, orSo, to upgrade the module let's run:
pip install --upgrade hana-ml
shapely
modulehana_ml
to support geospatial data manipulation, but must be separately installed manually to avoid errors like "name 'wkb' is not defined" or "ModuleNotFoundError: No module named 'shapely'." It is a known limitation and should be fixed in the next patch of hana_ml
.shapely
please follow: https://shapely.readthedocs.io/en/stable/project.html#installing-shapely.pip install shapely
01 Dataframes.ipynb
.hana_ml
will be used against some large volumes of data already stored in SAP HANA on-prem or in SAP HANA Cloud. But in our case of starting with the empty trial instance of SAP HANA Cloud, we need to load some data first. Actually, I showed already how to quickly load CSV files into SAP HANA in my post Quickly load data with hana_ml....import pandas
pandas.__version__
dfp_
notation for Pandas dataframes.dfp_nodes=pandas.read_csv('https://github.com/krlawrence/graph/raw/master/sample-data/air-routes-latest-nodes.csv')
dfp_edges=pandas.read_csv('https://github.com/krlawrence/graph/raw/master/sample-data/air-routes-latest-edges.csv')
print('Size of nodes dataframe: {}'.format(dfp_nodes.shape))
print('Size of edges dataframe: {}'.format(dfp_edges.shape))
dfp_nodes
dataframes?dfp_nodes.dtypes
~label
, so what are the node labels?dfp_nodes.groupby('~label').size()
type:string
) as well as some rows (like those labeled continet
or version
) that we do not need. Additionally, all columns have either some special characters (like ~
) or data types (like :object
) as part of their names that we do not need. Plus some of the columns have some data types too generic for their real content. And ideally, we need column names in all capitals for SAP HANA.dfp_ports
and check it!dfp_ports=(
dfp_nodes[dfp_nodes['~label'].isin(['airport'])]
.drop(['~label','type:string','author:string','date:string'], axis=1)
.convert_dtypes()
)
dfp_ports.columns=(dfp_ports.columns
.str.replace('~','')
.str.replace(':.*','')
.str.upper()
)
dfp_edges
.dfp_edges.dtypes
dfp_edges.groupby('~label').size()
dfp_routes=dfp_edges[dfp_edges['~label'].isin(['route'])].drop(['~label'], axis=1).copy()
dfp_routes.columns=dfp_routes.columns.str.replace('~','').str.replace(':.*','').str.upper()
HANAML
database userHANAML
created in the previous post let's switch to using it for further exercises.import hana_ml
hana_ml.__version__
hana_cloud_endpoint="<uuid>.hana.trial-<region>.hanacloud.ondemand.com:443"
hana_cloud_host, hana_cloud_port=hana_cloud_endpoint.split(":")
cchc=hana_ml.dataframe.ConnectionContext(port=hana_cloud_port,
address=hana_cloud_host,
user='HANAML',
password='Super$ecr3t!', #Should be your user's password 😉
encrypt=True
)
print(cchc.sql("SELECT SCHEMA_NAME, TABLE_NAME FROM TABLES WHERE SCHEMA_NAME='{schema_name}'"
.format(schema_name=cchc.get_current_schema()))
.collect()
)
HANAML
does not have any tables yet. So, let's save the data from Pandas dataframes to SAP HANA tables using hana_ml
.dfh_ports=hana_ml.dataframe.create_dataframe_from_pandas(cchc,
dfp_ports, "PORTS",
force=True
)
dfh_routes=hana_ml.dataframe.create_dataframe_from_pandas(cchc,
dfp_routes, 'ROUTES',
force=True)
dfh_
notation for HANA DataFrame variables.print(cchc.sql("SELECT SCHEMA_NAME, TABLE_NAME FROM TABLES WHERE SCHEMA_NAME='{schema_name}'"
.format(schema_name=cchc.get_current_schema()))
.collect()
)
collect()
method of the HANA dataframe.print(dfh_ports.collect())
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
13 | |
10 | |
10 | |
9 | |
7 | |
6 | |
5 | |
5 | |
5 | |
4 |