Here is an example of the utility in use. I start off with table data like the below, where I have known latitude and longitude coordinates:
Then I run the XS JS service with parameters specifying the source table, source fields and target fields. The URL looks somewhat like this (the parameters are explained more fully below):
http://<server>:80<instance>/<package path>/geodataEnrich.xsjs?schema=GEODATAENRICH&table= testtable03&maxrecs=10&log=active&fldblank=COUNTRY&fldstat=STATUS&fldlat=LATITUDE&fldlon= LONGITUDE&fldcty=COUNTRY&fldad1=REGION&fldad2=SUBREGION&fldad3=CITY&fldpost=POSTCODE
Here is the same table after the service runs, complete with textual address data. Easy peasy!
The “STATUS” column holds the results of the geocode API call. You can see “ZERO_RECORDS” for a point given in the middle of the Atlantic Ocean.
The motivation for this utility came from some work I did analysing UK traffic accident data where the raw data contained latitude and longitude but not much else useful about the geographic location of accidents. Then I read John Appleby‘s blog about analysing weather data, where he made a suggestion about a re-usable service to reverse geocode data. I realised that I’d already done a lot of what was mentioned there and making it more generic was not a big step. Hence this utility.
How to install and use
Get the source code from GitHub and create the 3 files in any XS project like this (the folder name does not matter):
The three files are as follows:
- geocodeApiGoogle.xsjslib: XSJS Library file that wraps the Google Geocode API (and the destination file below) and provides formatted reverse geocode results from a simple JS function call.
- geocodeApiGoogleDest.xshttpdest: HTTP destination file, this is required by the XS engine to make calls to external URLs.
- geodataEnrich.xsjs: the main XS service, this is what is called to do the work of reading and writing to your tables, making use of the XSJS library above.
You have to make one tiny edit to get everything plumbed in correctly, in the geocodeApiGoogle.xsjslib file, go to line 22 and edit this line to contain your project path as its first parameter (so you will repace “geodataenrich.services” with your project name and path):
var dest = $.net.http.readDestination("geodataenrich.services", "geocodeApiGoogleDest");
Ok, with that done we’re ready to go.
I have made a bunch of assumptions about the tables and fields that can be used. Firstly, the fields you write the resulting address data to are all strings. The fields you write address data to must also be long enough to hold the resulting address data. Rather than just blindly truncate, which could mess up subsequent analysis you may want to do, the program throws an error if any target field is too small. In testing I’ve found 50 characters is sufficient to hold any result. The final assumption is that the table being read from and written to must have a primary key. The primary key doesn’t need to be specified, but it must exist to allow the SQL UPDATE calls to work.
Rather than just fire off the service on a mass of data, I’ve found it safer to do this in a series of steps. Before going through these steps, lets look at the complete set of URL parameters and what they do:
||Explanation of Parameter
|maxrecs||10||The maximum records to update eg 1000. At time of writing the free Google service has a limit of 2500 calls per day, or 100000 calls for business accounts. Defaults to 2500.|
|mindelayms||500||The minimum delay in milliseconds between each API call to the Google service. Defaults to 500, meaing 500 milliseconds|
|Controls logging to screen or trace file. Omit the parameter entirely for no logging, use log=active to see details on screen when URL finishes, and use log=hana to write to the HANA trace file only (as an information level trace record).|
|simulate||active||Controls whether table UPDATEs are done. Omit the parameter entirely to update records, use simulate=active to not do any update.|
|schema||TESTSCHEMA||Source schema name (required)|
|table||testtable01||Source table name (required)|
|fldlat||LATITUDE||Source field holding latitude (required)|
|fldlon||LONGITUDE||Source field holding longitude (required)|
|fldcty||COUNTRY||Name of field in source table that will receive the Country address information (optional)|
|fldad1||REGION||Name of field in source table that will receive the admin level 1 information, like a region (optional)|
|fldad2||SUBREGION||Name of field in source table that will receive the admin level 2 information, like a sub-region (optional)|
|fldad3||SUBSUBREG||Name of field in source table that will receive the admin level 3 information, like a sub-sub-region or a city (optional)|
|fldpost||POSTCODE||Name of field in source table that will receive the post code or zip code information (optional)|
Name of field in source table that is used to identify records you want to write to, this is to prevent the same records being written to over and over again. If a record contains a value of NULL in this field, then this service will attempt to write to all target fields. If this field is filled with something not NULL, the record will not be selected. This field is required.
|fldstat||STATUS||Name of field in source table this will receive the status of the geocode API call (optional). A successful geocode result will produce a value of “OK” in this field. If you go over your daily limit this field will start holding “OVER_QUERY_LIMIT”. The complete list is available in the Google documentation.|
Ok, so now let’s work through the steps to safely reverse geocode your data.
First run the service on a small sample of your data (parameter maxrecs=10), in simulate mode (parameter simulate=active) with logging to the screen (parameter log=active), to make sure the parameters in the URL are ok. The processing rate defaults to being limited to 2 external API calls per second, which means a 500 millisecond delay between API calls, but you can change this with the parameter mindelayms. The XS service sleeps until the requisite interval has elapsed. Since we have logging on, we get a detailed view of what is going to happen:
Next run the service in live mode (omit the simulate parameter) with logging to screen for 10 records. If this completes ok without errors, then we’re good to go for a larger run.
Finally run the service in live mode without logging (omit the log parameter) for the full 2500 records, or more.
Here are some sample URLs. All URLs will start as usual:
The following URL will simulate the update of 10 records to table “GEODATA”.”testtable01″, with 400ms delay between calls, logging to screen, and storing
result of geocode API call in the field STATUS. The field to select on is COUNTRY (ie search for records with COUNTRY=NULL) and the fields to write to are ZIP and COUNTRY:
The following URL will do a live update of 2000 records, with 100ms delay between calls, with no logging. The field to select on is COUNTRY and the fields to write to are POSTCODE, REGION and SUBREGION:
Google API Limitations
At the time of writing Google place some restrictions on their service. Unless you have a business license, you need to limit to 2500 calls per day. Plus you are not supposed to make them too frequently (although they don’t specify precisely how frequently is allowed). They also state that the results should be destined for use on a Google map. Read the API documentation to get current limitations, which may well have changed since this blog was written.
As I mentioned earlier, I wanted to give some consideration to security. This utility contains UPDATE SQL calls and receives parameters from a URL (when you run the service) and parameters from an external 3rd party service (when you receive geocode results). These introduce some risks and to mitigate, firstly you should run the service logged in as a user with rights to update the tables you want, but ideally not every table. Secondly, when you’re running the service make sure the URL you write is correct by testing in simulate mode and checking the logs written to the screen. These logs show the SQL statements (or rather, templates for the SQL statements) that will be executed. These risks are well inside your control, but one aspect that is not in your control is the results of the geocode calls. What would happen if some nefarious 3rd party service returned address results that were designed to cause trouble when used in an SQL call? I am reminded here about the xkcd cartoon about a kid called Bobby Drop Tables:
The joke is of course about SQL injection, and in a similar way to the cartoon, you could imagine a deliberately evil address being written in a similar style to the kid’s name and causing trouble. If we were to blindly write “UPDATE TABLE SET FIELD = <whatever we got from the 3rd party service>” we are at risk. This risk goes away by parameterising the SQL statement like this “UPDATE TABLE SET FIELD = ?” and then replacing the ‘?’ by using XS API calls to insert properly escaped string values to the SQL. This has been done throughout the code for all the values that are retrieved from 3rd party services. So we’re protected against that risk well.
There are a few obvious improvements that could be made. More address fields could added (e.g. locality, neighbourhood), there are many listed in the Google API documentation. A UI might be nice, instead of dealing with crafting long URLs. At present only the Google Geocode API is supported, but it is very easy to extend (just add a new pair of files like geocodeApiGoogle.xsjslib and geocodeApiGoogleDest.xshttpdest that return similar objects with the address data). Also, only reverse geocoding is supported as that is all I required, but it would be moderately easy to support forward geocoding as it is just “going the opposite way”. The code does not pass JSLint (I found JSLint pretty harsh), but it has been pretty well tested. There are plenty of performance improvements that could be made (caching geocode API results and re-using them if subsequent points are very close) and it would also be possible to remove the current restriction that the table must have a primary key.
Well that’s it, the source code is there for anyone to use for any purpose, the only restriction is that I’m not held liable if it does not work as intended 😉 . Let me know if anyone finds a use for this in a project.