Technical Articles
Working with Esri shape files, Python and SAP HANA
Working with SHP files is not always as easy as it seems to be.
For example, we have a lot of interesting geo-spatial capabilities embedded in SAP HANA, but in one way or another we need to ingest this kind of information to the system. Usually, this spatial information is distributed on what is known as Shape files. Taking wikipedia definition, a shape file is:
The shapefile format is a geospatial vector data format for geographic information system (GIS) software. It is developed and regulated by Esri as a mostly open specification for data interoperability among Esri and other GIS software products. The shapefile format can spatially describe vector features: points, lines, and polygons, representing, for example, water wells, rivers, and lakes. Each item usually has attributes that describe it, such as name or temperature.
So, we have a great information richness when we find a good source of shape files and the next step is to query or to process this information on SAP HANA. You may start working with this information importing it using SAP HANA Studio… or you may start importing it with a Python script.
Note: This is a very naive approach for importing spatial information using Python. You may modify it for your convenience.
In order to import this information into SAP HANA we are going to use a Python library (PySHP) to read ESRI files, and we are going to combine it with HDBCli python implementation to comunicate with SAP HANA database.
Note: We are going to use SAP HANA Express Edition as an example, but the main thing is to establish a connection with your HANA system.
We start reading our shape file with python importing pyshp library in a python script. You may want to include unicode library in case your information no-geometry data contains special characters, this is my case.
import shapefile
import unicodedata
I prefer to define a variable with file path. In case of SHP files, you actually need more than one file to read the whole information, so it is sufficient to declare path file without extension; python library will do the rest. In this example, I’m going to work with file 01m.shp, so my FILE variable will be this:
FILE = "C:/spatialdata/data/00ent"
I will use with notation and make use of fields function
with shapefile.Reader(FILE, encoding="latin1") as shp:
fields = shp.fields
print(fields)
For this file, this is the console output:
[('DeletionFlag', 'C', 1, 0), ['CVEGEO', 'C', 2, 0], ['CVE_ENT', 'C', 2, 0], ['NOMGEO', 'C', 80, 0]]
If we want to import SHP file content to HANA, we need DDL instructions. If we don’t have it a priori, we can get it from that function result. From this last output we can see every field data type. According to pyshp library, this is the meaning:
- “C”: Characters, text.
- “N”: Numbers, with or without decimals.
- “F”: Floats (same as “N”).
- “L”: Logical, for boolean True/False values.
- “D”: Dates.
With this, we can build a basic DDL definition. For the completeness of SAP HANA DDL definition, we need to know which is the spatial reference identifier for this data set. So, this is our python code so far:
import shapefile
import unicodedata
FILE = "C:/spatialdata/data/00ent"
ESQUEMA = 'INEGI'
TABLA = '00ent'
SRID = 6372
def setVal(field):
if field[1] == 'C':
DEF = '"'+str(field[0])+'" NVARCHAR('+str(field[2])+'),'
elif field[1] == 'D' or field[1] == 'F':
DEF = '"'+str(field[0])+'" DECIMAL('+str(field[2])+','+str(field[3])+'),'
return DEF
def getDDL(esquema, tabla, srid, fields):
DDL = 'CREATE COLUMN TABLE "'+esquema+'"."'+tabla+'" ('
for field in fields[1:]:
DDL += setVal(field)
DDL += '"SHAPE" ST_GEOMETRY('+str(srid)+'))'
return DDL
with shapefile.Reader(FILE, encoding="latin1") as shp:
fields = shp.fields
records = shp.shapeRecords()
DDL = getDDL(ESQUEMA, TABLA, SRID, fields)
print(DDL)
To get the actual records value we can use shapeRecords() function; with records value we can generate the data insertion.
def doInsert(esquema, tabla, srid, records):
for r in records:
INSERT = 'INSERT INTO "'+esquema+'"."'+tabla+'" VALUES ('
INSERT += "'{0}'".format("','".join(r.record[:]))+","
if len(r.shape.parts) > 1:
INSERT += "NEW ST_MultiPolygon('MultiPolygon("
for t in range(len(r.shape.parts)-1):
l = ["%f %f" % p for p in r.shape.points[r.shape.parts[t]:r.shape.parts[t+1]]]
INSERT += "(({0})),".format(','.join(l))
l = ["%f %f" % p for p in r.shape.points[r.shape.parts[-1]:]]
INSERT += "(({0}))".format(','.join(l))
else:
INSERT += "NEW ST_POLYGON('POLYGON("
l = ["%f %f" % p for p in r.shape.points]
INSERT += "({0})".format(','.join(l))
INSERT += ")',"+str(srid)+"))"
return INSERT
And now we have
with shapefile.Reader(FILE, encoding="latin1") as shp:
fields = shp.fields
records = shp.shapeRecords()
DDL = getDDL(ESQUEMA, TABLA, SRID, fields)
INSERT = doInsert(ESQUEMA, TABLA, SRID, records)
Let’s generate the connection to SAP HANA so we can create the table and insert every shape founded in SHP Esri file. So:
from hdbcli import dbapi
connection = dbapi.connect(
address='hxehost',
port= 39015,
user='SYSTEM',
password='PASSWORD'
)
if connection.isconnected():
print("Conexion establecida")
else:
print("Error de conexion")
For SQL statements execution we must open a cursor and execute it
connection = dbapi.connect(
address='hxehost',
port= 39015,
user='SYSTEM',
password='PASSWORD'
)
if connection.isconnected():
print("Conexion establecida")
cursor = connection.cursor()
cursor.execute(DDL)
cursor.close()
else:
print("Error de conexion")
For the INSERT operation we have to modify our previous code to execute it on every record’s iteration.
Finally this is the whole Python code:
import shapefile
import unicodedata
from hdbcli import dbapi
FILE = "C:/spatialdata/data/00ent"
ESQUEMA = 'INEGI'
TABLA = 'INEGIPRUEBA'
SRID = 6372
def setVal(field):
if field[1] == 'C':
DEF = '"'+str(field[0])+'" NVARCHAR('+str(field[2])+'),'
elif field[1] == 'D' or field[1] == 'F':
DEF = '"'+str(field[0])+'" DECIMAL('+str(field[2])+','+str(field[3])+'),'
return DEF
def getDDL(esquema, tabla, srid, fields):
DDL = 'CREATE COLUMN TABLE "'+esquema+'"."'+tabla+'" ('
for field in fields[1:]:
DDL += setVal(field)
DDL += '"SHAPE" ST_GEOMETRY('+str(srid)+'))'
return DDL
def doInsert(esquema, tabla, srid, records, cursor):
for r in records:
INSERT = 'INSERT INTO "'+esquema+'"."'+tabla+'" VALUES ('
INSERT += "'{0}'".format("','".join(r.record[:]))+","
if len(r.shape.parts) > 1:
INSERT += "NEW ST_MultiPolygon('MultiPolygon("
for t in range(len(r.shape.parts)-1):
l = ["%f %f" % p for p in r.shape.points[r.shape.parts[t]:r.shape.parts[t+1]]]
INSERT += "(({0})),".format(','.join(l))
l = ["%f %f" % p for p in r.shape.points[r.shape.parts[-1]:]]
INSERT += "(({0}))".format(','.join(l))
else:
INSERT += "NEW ST_POLYGON('POLYGON("
l = ["%f %f" % p for p in r.shape.points]
INSERT += "({0})".format(','.join(l))
INSERT += ")',"+str(srid)+"))"
cursor.execute(INSERT)
with shapefile.Reader(FILE, encoding="latin1") as shp:
fields = shp.fields
records = shp.shapeRecords()
connection = dbapi.connect(
address='hxehost',
port= 39015,
user='SYSTEM',
password='PASSWORD'
)
if connection.isconnected():
print("Conexion establecida")
cursor = connection.cursor()
DDL = getDDL(ESQUEMA, TABLA, SRID, fields)
cursor.execute(DDL)
INSERT = doInsert(ESQUEMA, TABLA, SRID, records, cursor)
cursor.close()
else:
print("Error de conexion")
You can check the imported data from any client that is able to connect to your SAP HANA instance. If you want to visualize your data, it depends on SRID that you have used, but in general you need to transform it to SRID 4326. GEOJson format is also a good way to visualize your information.
For example, you can use this site to visualize your spatial data: geojson.tools. If you run this query on imported data you will have the necessary inputs for geojson.tools:
SELECT *, SHAPE.ST_TRANSFORM(4326).ST_ASGEOJSON() FROM INEGI.INEGIPRUEBA
These are the 32 polygons for each of every state in Mexico. This is how it looks Zacatecas on the map. Coordinates are coming from the imported data.
This geospatial information comes from this INEGI website.
Great article, Carlos! This will make life easier for every Python developer dealing with spatial data in SAP HANA!
Thank you, Mathias! I’m glad you like it.
Un articulo muy util Carlos...
¡Gracias, Rafa'!
Thank you very much for the code ! Very instructive.
Did you succeed in using the polygons in SAP Analytics Cloud ?
Yes I did.
Could you please describe the main additional steps? I followed another blog and manage to get points (ST_POINT) displayed in SAC but not the polygons (ST_GEOMETRY). I'm using a live connexion with hana 2.
This is the blog that I use as a reference when I need to work with polygons in SAC: https://blogs.sap.com/2018/11/15/sac-geo-spatial-custom-map-hierarchy/
Thank you for your answer Carlos. This blog relies on HANA Studio which is not available on CF... In Web IDE Full Stack, i created the SAP_BOC_SPATIAL folder and add a CV based on your table. I can see in SAC the SHAPE field in the location but that's all. The choropleth in SAC doesn't work on my side. Could you please indicate what you add in order to make it works ? Hierarchy, Shapepoint, is_leaf,... ?
I did everything as pointed in blog. There you can even find a SAP note reference, that states basically the same steps. I think one of the main steps is DU import, did you perform that step? If DU is not imported you won't be able to consume shape information on choropleth maps.