hana_ml
2.6, which I used as well in my demo in SAP TechEd’s DAT108 session.hana_ml
2.6 release several HANA DataFrame methods have been extended to support the processing of (geo-)spatial columns. And what is the better time to discuss this if not during Geography Awareness Week?02 Spatial.ipynb
.import pandas as pd
from hana_ml import dataframe as dfh
import hana_ml
, but aliases:pd
for Pandas, anddfh
for HANA DataFrame class.hana_cloud_endpoint="<uuid>.hana.trial-eu10.hanacloud.ondemand.com:443"
hana_cloud_host, hana_cloud_port=hana_cloud_endpoint.split(":")
cchc=dfh.ConnectionContext(port=hana_cloud_port,
address=hana_cloud_host,
user='HANAML',
password='Super$ecr3t!',
encrypt=True
)
cchc.connection.isconnected()
mylat
and mylon
. In my case:mylat, mylon = (51.1, 16.9)
https://ifconfig.co
service used in the earlier post Allow connections to SAP HANA Cloud instance from selected IP addresses — using the command line.import requests, json
s = requests.Session()
mylat, mylon = [json.loads
(
s.get("https://ifconfig.co/json").text
)
.get(key) for key in ['latitude', 'longitude']
]
s.close()
print
the coordinates we got.print(mylat, mylon)
mylat
and mylon
properly set, we can now build a HANA DataFrame to query the nearest airport.dfh_nearport=(cchc
.table("PORTS")
.select("CODE", "DESC",
('New ST_Point("LON", "LAT").ST_SRID(4326)', "PORT_LOC"),
(
'New ST_POINT({lon}, {lat}).ST_SRID(4326).ST_Distance(New ST_Point("LON", "LAT").ST_SRID(4326))'
.format(lon=mylon, lat=mylat)
, "DIST_FROMME"
)
)
.sort("DIST_FROMME").
head(1)
)
PORTS
two existing columns CODE
, DESC
plus two calculated columns:PORT_LOC
with the spatial location of an airportDIST_FROMME
with the distance in meters (that's the default unit of distance) from my location to an airport using the spatial reference ID SRID 4326
.SELECT
statement.dfh_nearport.select_statement
collect()
are called.dfp_nearport=dfh_nearport.collect()
dfp_nearport
WRO
(as expected ?), located about 5.8
kilometers from me. But what is that array of bytes [1, 1, 0, 0, 0, 0, 0, 0, 192, 195, 226, 48, 64...
as the location?hana_ml
2.6 have been extended with the geo_cols
and srid
optional properties, like sql()
.dfp_nearport=(cchc
.sql(
'''SELECT TOP 1
CODE, DESC,
New ST_Point(LON, LAT).ST_SRID(4326) as "PORT_LOC",
New ST_POINT(16.9599, 51.0791).ST_SRID(4326).ST_Distance(New ST_Point(LON, LAT).ST_SRID(4326)) as "DIST_FROMME"
FROM "PORTS"
ORDER BY "DIST_FROMME" ASC
'''.format(mylon, mylat),
geo_cols=["PORT_LOC"],
srid=4326
)
.collect())
display(dfp_nearport)
POINT (16.88579940795898 51.10269927978516)
that can be copied and pasted e.g. in http://geojson.io/ -> Meta
-> Load WKT String
to be displayed on the map.geo_cols
with HANA DataFrame.select_statement
into SQL with geo_cols
, like:dfp_nearport=(cchc
.sql(dfh_nearport.select_statement,
geo_cols={"PORT_LOC": 4326})
.collect()
)
display(dfp_nearport)
geo_cols
syntax. It can be:geo_cols={"PORT_LOC": 4326}
,geo_cols=["PORT_LOC"], srid=4326
. In this example, it is a list of one element, but could be more.LAT
and LON
columns on the SAP HANA table PORT
. It would be more convenient to create a column with this ST_GEOMETRY(4326)
data type already at the data persistence.dfp_ports
, the same as we did in 01 Dataframes.ipynb
once again in the current Jupyter notebook.dfp_nodes=pd.read_csv('https://github.com/krlawrence/graph/raw/master/sample-data/air-routes-latest-nodes.csv')
dfp_ports=(
dfp_nodes[dfp_nodes['~label'].isin(['airport'])]
.drop(['~label','type:string','author:string','date:string'], axis=1)
.convert_dtypes()
)
dfp_ports.columns=(dfp_ports.columns
.str.replace('~','')
.str.replace(':.*','')
.str.upper()
)
PORTS
thanks to force=True
property) HANA DataFrame, indicating that a pair f columns LON
and LAT
are X and Y coordinates of a spatial point with Spatial Reference ID 4326
.dfh_ports=dfh.create_dataframe_from_pandas(cchc,
dfp_ports, "PORTS",
geo_cols=[("LON", "LAT")],
srid=4326,
force=True
)
POINT_LON_LAT_GEO
column, as seen by running a command:dfh_ports.columns
hana_ml
2.6 is the ability to load Esri shapefiles, which is a very popular format to exchange geospatial data. I was very excited when saw this added functionality! I hope you are (or will be) too.Shapes
and download the zipped file into that subfolder. This time I will do this right from my notebook's cell:!mkdir -p ./Shapes
!wget https://thematicmapping.org/downloads/TM_WORLD_BORDERS-0.3.zip -O ./Shapes/TM_WORLD_BORDERS-0.3.zip
create_dataframe_from_shapefile()
method to persist the data in the SAP HANA table TM_WORLD_BORDERS
. You can notice the use of geo_cols
and srid
here as well.dfh_countries = dfh.create_dataframe_from_shapefile(
connection_context=cchc,
shp_file='./Shapes/TM_WORLD_BORDERS-0.3.zip',
table_name="TM_WORLD_BORDERS",
geo_cols=["SHAPE"],
srid=4326
)
Poland
in my case:dfh_countries.select("ISO2", "NAME", "SHAPE").filter("NAME='Poland'").collect()
ST_CoveredBy()
. Please note the use of the alias()
method to build the join of two HANA DataFrames dfh_ports
and dfh_countries
.(
dfh_ports.alias("P")
.join
(
dfh_countries.filter("NAME='Poland'").alias("C"),
condition='"P"."POINT_LON_LAT_GEO".ST_CoveredBy("C"."SHAPE")=1',
select=["P.*"]
)
).count()
COUNTRY
in the PORTS
table, but I hope you get the point of this last overengineering effort ?hana-ml-samples
repository: https://github.com/SAP-samples/hana-ml-samples/blob/main/Python-API/usecase-examples/multimodel-anal...."PUBLIC"."ST_UNITS_OF_MEASURE"
.ST_Distance()
.You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
40 | |
25 | |
17 | |
13 | |
7 | |
7 | |
7 | |
6 | |
6 | |
6 |