Usage of OSMnx via Python

mkemeter · ‎01-20-2020

After my recent blog about the integration with DBeaver and SAP HANA Spatial I received a couple of questions on how to upload OpenStreetMap data (i.e. Points of Interest, POI) to SAP HANA. In that blog I was using the OpenStreetMap POI data to analyze the distribution of beer gardens in Germany.

As mentioned previously, there is of course the straight-forward way of downloading files and uploading them to SAP HANA. A full list of supported import formats can be found in the documentation:

Supported Import and Export Formats for Spatial Data

If you are dealing with an Esri Shapefile which can for example be downloaded via Geofabrik you can find more information here:

Import ESRI Shapefiles using SQL Commands

However, if you are interested in a little more automation and are approaching this more from a developer side, you may be interested in the possibility of selecting and uploading OSM data with Python. To make this technical example more interesting, I am going to illustrate this with a little analysis on Germany's coverage with charging stations for electrical vehicles.

Usage of OSMnx via Python

Lately I was wondering what the most convenient way for retrieving and handling OSM data in Python would be. I found a blog on towardsdatascience.com which inspired me to look into OSMnx.

https://towardsdatascience.com/retrieving-openstreetmap-data-in-python-1777a4be45bb

So essentially the blog describes how to download OSM data to Python with just a few lines of code. Even more! It downloads the street networks into graph structures that you can use for further analysis. The second part of the blog also deals with Point of Interest (POI) data, which is essentially OSM point data without a network or graph relation.

Geoff Boeing, the initiator of OSMnx, says:

"OSMnx lets you download spatial geometries and construct, project, visualize, and analyze complex street networks. It allows you to automate the collection and computational analysis of street networks for powerful and consistent research, transportation engineering, and urban design."

Cool! Let's see how this works with SAP HANA.

As an example we are going to download the location and attribute data of all charging stations in Germany to obtain a coverage map and identify areas where you should not go with your electric vehicle. I am going to use

SAP HANA Express for handling the spatial data,

Jupyter Notebooks for writing the Python code

and DBeaver for a quick visualization.

You should be a little familiar with Python to follow the below steps. If you are not you can skip the Python parts and move on to the actual analysis in SAP HANA further below.

Python Dependencies

The code will be using OSMnx, of course. Furthermore we will do some data handling in Pandas and use SQLAlchemy with its SAP HANA dialect.

import osmnx as ox

import pandas as pd

import sqlalchemy

If you have not setup SQLAlchemy with SAP HANA before please follow this quick tutorial for installing the respective plugin:

https://developers.sap.com/tutorials/hxe-python-sqlalchemy-users.html

Download POI Data

Now let's download the locations of all charging stations in Germany. This is probably going to be a big bunch of code....

gdf_poi = ox.pois_from_place('Germany', amenities=['charging_station'])

Yes, that's really it. Ok, this is gonna last a while. For me it took almost 10 minutes to retrieve all the data and get the result as GeoPandas GeoDataFrame. You can add some cell magic to measure the execution time and also I will add one more statement which just returns the dimension of the GeoDataFrame.

%%time

gdf_poi = ox.pois_from_place('Germany', amenities=['charging_station'])

gdf_poi.shape

Looking at the output, I can see that I downloaded 9417 entries with 559 columns in about 10 minutes:

If you are curious you may list all the 559 columns with the following statement. Please note that the columns will be specific to the data you download. When retrieving another recordset the structure may look different.

list(gdf_poi.columns)

A list of all POI types ("amenities") can be found here:

https://wiki.openstreetmap.org/wiki/Key:amenity

To replicate the example from my previous blog, you would simply choose "biergarten" instead of "charging_station".

Adjusting Data Types and Structure

We do have all data already available in Python in a GeoDataFrame (gdf_poi). Since GeoPandas currently does not offer a possibility to conveniently write to a database, we first need to get rid of the Geospatial data type and simply convert it to a string. We will convert it back to a geometry as soon as the data arrived in SAP HANA.

df_poi = pd.DataFrame(gdf_poi)

df_poi["geometry"] = df_poi["geometry"].astype("str")

With the above statement we also create a standard Pandas DataFrame (df_poi), because Pandas does have a function to conveniently write data to a database using SQLAlchemy.

There are still two columns we need to remove. One is simply redundant - its name "fixme" appears twice in the dataset. Guessing from its name I can say:

It is probably not that important to us at the moment

It is presumably not yet fixed.

The other column ("nodes") contains an array datatype which is not supported by the database interface (although the array type is indeed supported by SAP HANA).

df_poi = df_poi.drop(columns=['fixme', 'nodes'])

Per default most data types in the DataFrame are set to "object". A few numerical types can be inferred by executing this statement:

df_poi = df_poi.infer_objects()

However, the string/varchar types will still remain an object. We will handle these a bit later while inserting into the database. You can check the individual data types with the statement below:

df_poi.dtypes

Upload POI data to SAP HANA

The Pandas DataFrame offers a to_sql function, which uses SQLAlchemy to persist a DataFrame in a database. To leverage this functionality we first need to establish a connection to SAP HANA given the following statement:

hdb_connection = sqlalchemy.create_engine('hana://%s:%s@%s:%s' % (hdb_user, hdb_password, hdb_host, hdb_port)).connect()

As mentioned above we still have some "string" columns, which have the data type "object". This is not necessarily an issue. If we would persist the current state to the database, this would simply result in some "CLOB" columns. However, handling of the data (e.g. filtering) will be nicer, if the data type is set correctly. We obtain all remaining object columns with this statement:

obj_cols = df_poi.select_dtypes(include=[object]).columns.values.tolist()

When now dumping the DataFrame to the database, we specify that all columns from that list should be treated as String(512) (VARCHAR(512) on DB side).

df_poi.to_sql(name = 'osm_poi', con = hdb_connection, if_exists = 'replace', chunksize = 100, dtype={c: sqlalchemy.types.String(512) for c in obj_cols})

After a few seconds the data will be available in SAP HANA - with exception to the geometrical data type, which we removed earlier. With our database connection still established, we can replace the existing column "geometry" (data type VARCHAR(512)) with a column "shape" (data type ST_GEOMETRY(3857)).

hdb_connection.execute("ALTER TABLE OSM_POI ADD (SHAPE ST_GEOMETRY(3857))")

hdb_connection.execute("UPDATE OSM_POI SET SHAPE = ST_GEOMFROMTEXT(GEOMETRY, 4326).ST_TRANSFORM(3857)")

hdb_connection.execute("ALTER TABLE OSM_POI DROP (GEOMETRY)")

Finally, we have the POIs saved in SAP HANA with a proper data format. If you simply wanted to achieve this, you can stop reading. However, if you are curious what to do with the charging station location, please continue reading.

Where to (NOT) Charge my Car.

If you followed the other steps upfront, you will have a table called OSM_POI in SAP HANA. This table contains all OpenStreetMap POIs which are classified as "charging_station". With this data we can do a simple estimation if there are blind spots in the German loading network coverage. Of course, since we did not import actual street connection this can only be a rough estimation.

Since we can online calculate the beeline distance between our points, we need to estimate the actual street distance. For places European places or locations with similar infrastructure, you would typically estimate this with a street factor of 1.3. This means that a beeline distance of 100 km would result in an estimated street distance of 100 * 1.3 = 130 km.

I am a bit pessimistic - so let's assume the following:

The effective range of my vehicle is 200 km, which seems a realistic value for a non-economic driver in a middle class vehicle.

We only want to include charging stations that are open 24/7 and are actually charging stations for cars.

We ignore technical specifics such as the type of plug or the voltage.

So, the charging station which are relevant to us are selected with the following statement.

SELECT *

FROM OSM_POI

WHERE CAR = 'yes' AND OPENING_HOURS = '24/7'

The result are 1366 stations in Germany.

Given my effective driving distance of 200km and considering the street factor, that mean I run into issues if two of the charging stations on my way have a higher beeline distance of 200 / 1.3 ~ 154km. In other words I run into issues in case I am further away than 77km from one charging station and the distance to the next one is higher than 77km.

If we draw a circle of 77km around each charging station the gaps on the map will show "problematic" areas in terms of coverage. This can be achieved with function ST_Buffer.

SELECT shape.ST_BUFFER(77000)

FROM OSM_POI

WHERE CAR = 'yes' AND OPENING_HOURS = '24/7'

When displaying the results we can already guess where this is going to.

Other than guessing we are interested in isolating the non-coverage areas with proper polygons. First step is to combine all circles to one polygon using ST_UnionAggr.

SELECT ST_UNIONAGGR(shape.ST_BUFFER(77000))

FROM OSM_POI

WHERE CAR = 'yes' AND OPENING_HOURS = '24/7'

What we see is a polygon covering the majority of Germany (fortunately, for any ev driver). The polygon has a certain number of interior rings, which show non-covered areas. To retrieve the number of interior rings, we can use the function ST_InteriorRings.

SELECT ST_UNIONAGGR(shape.ST_BUFFER(77000)).ST_NUMINTERIORRINGS()

FROM OSM_POI

WHERE CAR = 'yes' AND OPENING_HOURS = '24/7'

As a result you can see that there are 6 gaps in coverage of 24/7 charging stations in Germany. These rings can individually be selected by using method ST_InteriorRingN. The largest non-covered area is interior ring number 3.

SELECT ST_UNIONAGGR(shape.ST_BUFFER(77000)).ST_INTERIORRINGN(3)

FROM OSM_POI

WHERE CAR = 'yes' AND OPENING_HOURS = '24/7'

Ok, as a large part of this area is a military drill ground (left part of the polygon), it is anyway not a good idea to go there - even with your old Diesel vehicle.

On the other hand this area contains the small town of Tangermünde (right upper part of the polygon), which is - according some random travel article - the "nicest small town in Germany".

Anyway, if the effective range of your vehicle is above 250 km or you want to stay overnight in Tangermünde (because of the 24/7 restriction) you do not have to care. But that assumption would have caused a boring example for this blog...

Conclusion

We have seen two things:

Importing OSM POI data in SAP HANA by using OSMnx.

Rough estimation of Germany's coverage with charging stations using some simple SQL statements.

Especially on the second part one could think of using OSMnx to also import the street network (at least major streets and use SAP HANA's Graph Engine to do a proper analysis without estimating beeline distances. Frankly, I am running out of time to do this. It's possible and I herewith offer to send over a pack of 5 prestigious SAP HANA Spatial stickers to the first doing this and writing a blog about it!

This is what the sticker is going to look like!

Minimal code example to replicate the download and import

# SAP HANA credentials

hdb_host = 'xxx.xxx.xxx.xxx'

hdb_port = xxxxx

hdb_user = 'xxx'

hdb_password = 'xxx'



# Imports

import osmnx as ox

import pandas as pd

import sqlalchemy



# Download OSM POIs

gdf_poi = ox.pois_from_place('Germany', amenities=['charging_station'])



# Adjust data types and structure

df_poi = pd.DataFrame(gdf_poi)

df_poi["geometry"] = df_poi["geometry"].astype("str")

df_poi = df_poi.drop(columns=['fixme', 'nodes'])

df_poi = df_poi.infer_objects()



# Connect to SAP HANA

hdb_connection = sqlalchemy.create_engine('hana://%s:%s@%s:%s' % (hdb_user, hdb_password, hdb_host, hdb_port)).connect()



# Upload data

obj_cols = df_poi.select_dtypes(include=[object]).columns.values.tolist()

df_poi.to_sql(name = 'osm_poi', con = hdb_connection, if_exists = 'replace', chunksize = 100, dtype={c: sqlalchemy.types.String(512) for c in obj_cols})



# Add column with type ST_GEOMETRY in SAP HANA

hdb_connection.execute("ALTER TABLE OSM_POI ADD (SHAPE ST_GEOMETRY(3857))")

hdb_connection.execute("UPDATE OSM_POI SET SHAPE = ST_GEOMFROMTEXT(GEOMETRY, 4326).ST_TRANSFORM(3857)")

hdb_connection.execute("ALTER TABLE OSM_POI DROP (GEOMETRY)")

Where to (NOT) Charge my Car? - OpenStreetMap POIs in SAP HANA

Usage of OSMnx via Python

Python Dependencies

Download POI Data

Adjusting Data Types and Structure

Upload POI data to SAP HANA

Where to (NOT) Charge my Car.

Conclusion

Get Your SAP HANA Idea Incubator Badge Today!

SCN Mission - SAP HANA Quiz Challenge is now retired

Share your #HANAStory and Win