Linking Latitude and Longitude to Countries with SAP HANA Cloud
Today, a colleague approached me with a simple question:
Given a table with latitude and longitude values. How do I determine the respective country for each entry?
After a series of more advanced geospatial blog posts, I thought it’s about time to spend a post on this more basic, but likewise important question.
The actual solution of the question above is super-easy and comes down to a single select statement using a spatial join. So, the major part of this explanation will be about data import of the following files:
- We need a dataset with global latitude and longitude data. On Kaggle there is a nice data set with Global Power Plants. This dataset already contains a country column, which we will of course ignore (otherwise the question above does not make much sense). Instead of this, you can also use your own dataset, where the problem statement makes more sense.
- To determine to which country a coordinate belongs, we need to have an idea of the extent of each country. The easiest way is to upload a shapefile with global country borders to our SAP HANA Cloud database. A suitable file can be found here.
You can follow the steps below with SAP HANA Cloud Trial or Free Tier.
Import a CSV file with Latitude and Longitude Values
The first part is not even related to the initial question, since you would typically already have some lat/lon data in your database when asking this question. Still, I think it’s a good idea to make the upload part of the description, so that this blogs works as a self-contained end-to-end example.
Please skip this part in case you want to leverage your own lat/lon dataset.
- Download and unzip the Global Power Plants dataset from Kaggle (requires a free account).
- Open your SAP HANA Cloud instance in Database Explorer
- Right-click your instance and choose ‘Import Data’
- Choose ‘Import Data’ again
- Select the downloaded CSV file
- Choose your import target
- Adjust the type mapping. Make sure to change Capacity, Latitude and Longitude to Double and increase the length of the text types (e.g. to 500).
- Start the import and wait for the records to be imported into table ‘PLANTS’.
You wouldn’t believe it, but this was the most complex part! From now on, things will get easier.
Import a Shapefile with Global Country Borders
Next, we need to import the global country borders. The easiest way to do this is by using a standard exchange format for spatial data – the so called Shapefile.
- Download the shapefile representation of the World Administrative Boundaries.
- Do not unzip this file. The zip format is just fine for import.
- As before, open Database Explorer, right-click your instance and choose ‘Import Data’.
- This time select ‘Import Esri Shapefiles’
- Select the downloaded zip file
- Select the target schema for the import. I am using the same target as for the CSV import.
- As a Spatial Reference Identifier choose “WGS 84 (planar)” with SRID 1000004326.
- Start the import and wait for the records to be imported into table ‘world-administrative-boundaries’.
We have now imported all the data for our example.
Determine the Country for each Power Plant
Again, our power plant dataset already has the country information included. We will ignore this information and add the country by the knowledge of latitude and longitude information. Of course, we can use the country information of the dataset itself to double-check our results for correctness.
To match each lat/lon point with its country, we need to execute two spatial operations:
- Create a point geometry from lat/lon information. This can be done using SQL function ST_GeomFromText. The minimal usage example looks like the following:
SELECT ST_GeomFromText('POINT(8.642057 49.293432)', 1000004326) FROM DUMMY;
- We do a spatial join by predicate ST_Intersects to match two entries in the respective table in case the point is contained within the polygon of the country.
Knowing these two spatial functions, we can add the country information from table ‘world-administrative-boundaries’ to our data table by executing this SQL query:
SELECT wab."iso3", wab."name", p.* FROM PLANTS p LEFT JOIN "world-administrative-boundaries" wab ON ST_GeomFromText('POINT(' || p."Longitude" || ' ' || p."Latitude" || ')', 1000004326).ST_Intersects(wab.SHAPE) = 1
As a result we added columns “iso3” (the ISO code of the country) as well as the standardized “name” to our result set. We can also see that the information matches the datasets country in most cases. Exception can be found in the bordering regions of countries, where the level of detail of the used world border are probably not sufficient.
This solved our initial question. So everyone, who is short on time, can safely stop reading.
⚠️ In this example we are generating the geometries on the fly instead of using the built-in column type ST_Geometry. Thus SAP HANA cannot make use of spatial indices, which results in decreased performance.
I strongly recommend to use persistent geometries with type ST_Geometry for productive scenarios.
Add Additional Insights By Spatial Aggregation
Now, that we already have the data imported and sufficiently answered our initial question, we may as well add a little analysis on top. The following view helps us identify the countries with the most capacity by power plants. Note, that we aggregate over the spatial dimension.
CREATE OR REPLACE VIEW CAPA_PER_COUNTRY AS ( SELECT wab."iso3", wab."name", wab.SHAPE, SUM(p."Capacity (MW)") AS CAPACITY FROM PLANTS p JOIN "world-administrative-boundaries" wab ON ST_GeomFromText('POINT(' || p."Longitude" || ' ' || p."Latitude" || ')', 1000004326).ST_Intersects(wab.SHAPE) = 1 GROUP BY wab."iso3", wab."name", wab.SHAPE )
The result doesn’t come as a total surprise: The biggest countries are the ones with the highest capacity.
To make the results more interesting, we could generate the same report, but this time ordered by capacity per country size. To evaluate the size of a country, we can use spatial function ST_Area.
SELECT TOP 10 * FROM CAPA_PER_COUNTRY ORDER BY CAPACITY/SHAPE.ST_AREA() DESC
The result of this analysis looks significantly different.
We could have easily achieved the top list by capacity by just using the Global Power Plants dataset and without doing our spatial join. However, the last analysis (capacity per area) can only be done knowing the actual administrative boundaries.
You have seen how to upload spatial data to SAP HANA using the Database Explorer. We have imported a file with simple latitude/longitude values as well as a geospatial shapefile with complex country boundaries in form of polygons.
In a second step we have used a spatial join to match each lat/lon pair with the country it belongs to, which solved our initial question: Given a table with latitude and longitude values. How do I determine the respective country for each entry?
Last but not least, we have used a spatial aggregation to show the potential value of adding the country information and polygon to the original dataset. If you use a proper GIS client, you can even visualize the results with a choropleth map.
For more resources on SAP HANA Spatial check https://cutt.ly/saphanaspatial