Skip to Content

Geocoding is the process of taking address information and turning it into geographic coordinates that can be used to view that location on a map.  Reverse geocoding is the opposite process where you start with a point on the globe (perhaps a latitude longitude coordinate) and convert it into a textual address.  This blog explains how to use an XS JavaScript utility to reverse geocode data in your HANA tables.  This utility currently makes use of the Google Geocoding API but could easily be extended to use other service providers.  All the source code is available on GitHub, all feedback welcome!

Example

Here is an example of the utility in use.  I start off with table data like the below, where I have known latitude and longitude coordinates:

01 before v2.png

Then I run the XS JS service with parameters specifying the source table, source fields and target fields.  The URL looks somewhat like this (the parameters are explained more fully below):


http://<server>:80<instance>/<package path>/geodataEnrich.xsjs?schema=GEODATAENRICH&table=
testtable03&maxrecs=10&log=active&fldblank=COUNTRY&fldstat=STATUS&fldlat=LATITUDE&fldlon=
LONGITUDE&fldcty=COUNTRY&fldad1=REGION&fldad2=SUBREGION&fldad3=CITY&fldpost=POSTCODE

Here is the same table after the service runs, complete with textual address data.  Easy peasy!

02 after v2.png

The “STATUS” column holds the results of the geocode API call.  You can see “ZERO_RECORDS” for a point given in the middle of the Atlantic Ocean.

The motivation for this utility came from some work I did analysing UK traffic accident data where the raw data contained latitude and longitude but not much else useful about the geographic location of accidents.  Then I read John Appleby‘s blog about analysing weather data, where he made a suggestion about a re-usable service to reverse geocode data.  I realised that I’d already done a lot of what was mentioned there and making it more generic was not a big step.  Hence this utility.

Goals

My goals were to produce something that was production ready, not just a proof of concept or a demo, but an immediately usable XS JavaScript service.  This means supporting exception handling, trace files, simulation mode and giving due consideration to security.  It also means supporting “throttling” so that the API calls are not made to Google’s service too quickly (something mentioned in their terms of service).  I also wanted something that would be as easy to install as possible, so that means the very fewest files that could be cut-and-pasted and no need for package imports or configuration.  Finally I wanted something that would be easily extensible, so that more features could be added easily if it proved useful.

How to install and use

Get the source code from GitHub and create the 3 files in any XS project like this (the folder name does not matter):

03 project.png

The three files are as follows:

  • geocodeApiGoogle.xsjslib: XSJS Library file that wraps the Google Geocode API (and the destination file below) and provides formatted reverse geocode results from a simple JS function call.
  • geocodeApiGoogleDest.xshttpdest: HTTP destination file, this is required by the XS engine to make calls to external URLs.
  • geodataEnrich.xsjs: the main XS service, this is what is called to do the work of reading and writing to your tables, making use of the XSJS library above.

You have to make one tiny edit to get everything plumbed in correctly, in the geocodeApiGoogle.xsjslib file, go to line 22 and edit this line to contain your project path as its first parameter (so you will repace “geodataenrich.services” with your project name and path):


var dest = $.net.http.readDestination("geodataenrich.services", "geocodeApiGoogleDest");

Ok, with that done we’re ready to go.

Assumptions

I have made a bunch of assumptions about the tables and fields that can be used.  Firstly, the fields you write the resulting address data to are all strings.  The fields you write address data to must also be long enough to hold the resulting address data.  Rather than just blindly truncate, which could mess up subsequent analysis you may want to do, the program throws an error if any target field is too small.  In testing I’ve found 50 characters is sufficient to hold any result.  The final assumption is that the table being read from and written to must have a primary key.  The primary key doesn’t need to be specified, but it must exist to allow the SQL UPDATE calls to work.

Suggested Usage

Rather than just fire off the service on a mass of data, I’ve found it safer to do this in a series of steps.  Before going through these steps, lets look at the complete set of URL parameters and what they do:

URL Parameter
Sample Value
Explanation of Parameter
maxrecs 10 The maximum records to update eg 1000.  At time of writing the free Google service has a limit of 2500 calls per day, or 100000 calls for business accounts. Defaults to 2500.
mindelayms 500 The minimum delay in milliseconds between each API call to the Google service.  Defaults to 500, meaing 500 milliseconds
log

active

Controls logging to screen or trace file.  Omit the parameter entirely for no logging, use log=active to see details on screen when URL finishes, and use log=hana to write to the HANA trace file only (as an information level trace record).
simulate active Controls whether table UPDATEs are done.  Omit the parameter entirely to update records, use simulate=active to not do any update.
schema TESTSCHEMA Source schema name (required)
table testtable01 Source table name (required)
fldlat LATITUDE Source field holding latitude (required)
fldlon LONGITUDE Source field holding longitude (required)
fldcty COUNTRY Name of field in source table that will receive the Country address information (optional)
fldad1 REGION Name of field in source table that will receive the admin level 1 information, like a region (optional)
fldad2 SUBREGION Name of field in source table that will receive the admin level 2 information, like a sub-region (optional)
fldad3 SUBSUBREG Name of field in source table that will receive the admin level 3 information, like a sub-sub-region or a city (optional)
fldpost POSTCODE Name of field in source table that will receive the post code or zip code information (optional)
fldblank COUNTRY

Name of field in source table that is used to identify records you want to write to, this is to prevent the same records being written to over and over again. If a record contains a value of NULL in this field, then this service will attempt to write to all target fields.  If this field is filled with something not NULL, the record will not be selected. This field is required.

fldstat STATUS Name of field in source table this will receive the status of the geocode API call (optional). A successful geocode result will produce a value of “OK” in this field.  If you go over your daily limit this field will start holding “OVER_QUERY_LIMIT”.  The complete list is available in the Google documentation.

Ok, so now let’s work through the steps to safely reverse geocode your data.

Step 1

First run the service on a small sample of your data (parameter maxrecs=10), in simulate mode (parameter simulate=active) with logging to the screen (parameter log=active), to make sure the parameters in the URL are ok.  The processing rate defaults to being limited to 2 external API calls per second, which means a 500 millisecond delay between API calls, but you can change this with the parameter mindelayms.  The XS service sleeps until the requisite interval has elapsed.  Since we have logging on, we get a detailed view of what is going to happen:

04 logging.png

Step 2

Next run the service in live mode (omit the simulate parameter) with logging to screen for 10 records.  If this completes ok without errors, then we’re good to go for a larger run.

Step 3

Finally run the service in live mode without logging (omit the log parameter) for the full 2500 records, or more.

Here are some sample URLs.  All URLs will start as usual:


http://<server>:80<instance>/<package path>/

The following URL will simulate the update of 10 records to table “GEODATA”.”testtable01″, with 400ms delay between calls, logging to screen, and storing

result of geocode API call in the field STATUS.  The field to select on is COUNTRY (ie search for records with COUNTRY=NULL) and the fields to write to are ZIP and COUNTRY:


geodataEnrich.xsjs?schema=GEODATAENRICH&table=testtable01&maxrecs=10&mindelayms=400&log=
active&simulate=active&fldblank=COUNTRY&fldstat=STATUS&fldpost=ZIP&fldcty=COUNTRY

The following URL will do a live update of 2000 records, with 100ms delay between calls, with no logging.  The field to select on is COUNTRY and the fields to write to are POSTCODE, REGION and SUBREGION:



geodataEnrich.xsjs?schema=GEODATAENRICH&table=testtable01&maxrecs=2000&mindelayms=100&
fldblank=COUNTRY&fldpost=POSTCODE&fldad1=REGION&fldad2=SUBREGION


Google API Limitations

At the time of writing Google place some restrictions on their service.  Unless you have a business license, you need to limit to 2500 calls per day.  Plus you are not supposed to make them too frequently (although they don’t specify precisely how frequently is allowed).  They also state that the results should be destined for use on a Google map.  Read the API documentation to get current limitations, which may well have changed since this blog was written.

Security

As I mentioned earlier, I wanted to give some consideration to security.  This utility contains UPDATE SQL calls and receives parameters from a URL (when you run the service) and parameters from an external 3rd party service (when you receive geocode results).  These introduce some risks and to mitigate, firstly you should run the service logged in as a user with rights to update the tables you want, but ideally not every table.  Secondly, when you’re running the service make sure the URL you write is correct by testing in simulate mode and checking the logs written to the screen.  These logs show the SQL statements (or rather, templates for the SQL statements) that will be executed.  These risks are well inside your control, but one aspect that is not in your control is the results of the geocode calls.  What would happen if some nefarious 3rd party service returned address results that were designed to cause trouble when used in an SQL call?  I am reminded here about the xkcd cartoon about a kid called Bobby Drop Tables:

http://imgs.xkcd.com/comics/exploits_of_a_mom.png

The joke is of course about SQL injection, and in a similar way to the cartoon, you could imagine a deliberately evil address being written in a similar style to the kid’s name and causing trouble.  If we were to blindly write “UPDATE TABLE SET FIELD = <whatever we got from the 3rd party service>” we are at risk.  This risk goes away by parameterising the SQL statement like this “UPDATE TABLE SET FIELD  = ?” and then replacing the ‘?’ by using XS API calls to insert properly escaped string values to the SQL.  This has been done throughout the code for all the values that are retrieved from 3rd party services.  So we’re protected against that risk well.

Improvements

There are a few obvious improvements that could be made.  More address fields could added (e.g. locality, neighbourhood), there are many listed in the Google API documentation.  A UI might be nice, instead of dealing with crafting long URLs.  At present only the Google Geocode API is supported, but it is very easy to extend (just add a new pair of files like geocodeApiGoogle.xsjslib and geocodeApiGoogleDest.xshttpdest that return similar objects with the address data).  Also, only reverse geocoding is supported as that is all I required, but it would be moderately easy to support forward geocoding as it is just “going the opposite way”.  The code does not pass JSLint (I found JSLint pretty harsh), but it has been pretty well tested.  There are plenty of performance improvements that could be made (caching geocode API results and re-using them if subsequent points are very close) and it would also be possible to remove the current restriction that the table must have a primary key.

Well that’s it, the source code is there for anyone to use for any purpose, the only restriction is that I’m not held liable if it does not work as intended 😉 .  Let me know if anyone finds a use for this in a project.

To report this post you need to login first.

9 Comments

You must be Logged on to comment or reply to a post.

  1. John Appleby

    Great job, thank you for doing this. I meant to do it myself, but well, I got distracted. There are so many HANA Enterprise apps where this could be used to add geo capabilities.

    Did you consider adding the latitude/longitude as a spatial type? I was thinking about this, and then the ability to use ESRI shapes to determine e.g. counties.

    (0) 
    1. Kevin Small Post author

      Hi John, thanks for the feedback.

      You’re right supporting spatial types in the tables would be a worthwhile addition, unfortunately I was only able to read about the spatial processing features and not use them.  I did this as a side project on SP6, and spatial was only available then as an early adoptor program.  Once I’ve upgraded to SP7 I will add this.

      Nice idea about using ESRI shapes to determine countries – that would then remove the limitations on speed/usage that the Google licence imposes.  That would be really great!  Maybe after I am on SP7 I can explore this.

      (0) 
      1. Aron MacDonald

        Brilliant stuff Kevin, I’ll definitely give that a try.

        You should also check out D3 which is a third party API already installed on HANA, which can draw some  interesting maps (without the need for Google) as well as other cool visualizations. (Gallery · mbostock/d3 Wiki · GitHub).

        With a detailed enough world map in GeoJSON you could probably also use D3 to perform a reverse geocode. I’ve seen examples somewhere where you click on the D3 map and it detects the underlying country, based on the lat/long of the click.

        (0) 
        1. Kevin Small Post author

          Thanks Aron for feedback.  That sounds a good idea, I didn’t know D3 had geo capabilities, I will take a look.  So with John’s idea above that gives 2 potential routes to do reverse geocoding without relying on Google – nice!

          (0) 
          1. Aron MacDonald

            D3 is best at the interactive visualization part, but once you get hold of SPS7 you can try uploading shape files of the world (e.g 1:10m Cultural Vectors | Natural Earth) and then check which shape (country) your point is located in, using SQL.

            e.g. Where is London? (Lat 51.85333 Long -0.22083)

            select id from  SpatialShapes where shape.ST_Contains(NEW ST_Point(  -118.41667, 34.08333 )) = 1;

            Thanks Jon-Paul for a similar XSJS use in your blog.

            The more detailed your shape files (and associated metadata) the more detailed info you could Reverse Geocode in HANA.

            BTW you may have noticed SPS7 is now available for developers so I can’t wait to see your next version  😉

            SAP HANA Developer Edition v1.7.70

            (0) 
            1. Kevin Small Post author

              Thanks for example, that looks pretty straightforward, a neat idea.

              Yeah I’d like to move to SP7, but I went for Cloudshare for hosting because I liked their flat-rate monthly billing and from what I can gather there aren’t any plans for them to move to SP7.  If they don’t move I will make the jump to AWS.

              (0) 
    1. Kevin Small Post author

      Thanks for feedback Jon-Paul, yes there look like lots of interesting use cases with spatial data, I just need to get updated to SP7.

      (0) 

Leave a Reply