HANA Spatial Demos: Geocoding, Clustering, Aggrega...

former_member191806 · ‎04-30-2018

Recently, I participated in vitaliy.rudnytskiy's Code JAM on HANA Spatial and remembered how fun working with HANA Spatial can be. A while back, I've built some small demo apps as an introduction to spatial and I decided to share them with the community. After all, sharing is caring 🙂

All the code is freely available on GitHub: serban-petrescu/hana-spatial-demo. More detailed steps for setting it up can be found in the README. For the remainder of the blog, I will describe the rationale behind it and discuss on the more interesting snippets of code.

Business story

Let's imagine a purely fictional story to back-up our applications:

Our customer is an insurance company specialised in house insurance.

Currently, they have all their data in Excel files.

They want to move it into a dedicated system capable of grouping houses in the same area together.

Looking back, this scenario might not be that far from reality after all.

Objectives

To fulfil our customer's needs, we will do the following steps:

Take the data from the Excel file (.csv) and import it in HANA. This data contains:
- The insurance policy number (unique business key for a policy).
- The policy holder name.
- The insured object's address.
- And the total sum insured.

Geocode the house addresses to get their coordinates.

Build apps for visualising, aggregating and clustering the houses.

Technical setup

We will do all of this on a HANA Trial MDC, which we create following the steps described in the "Setup your trial SAP HANA MDC instance" SAP developer tutorial. The development itself can be done directly in the Web-based Workbench.

The Google Geocoding API is a fairly good tool for obtaining coordinates from textual addresses. To use it, we also need an API Key for it. In addition, HANA needs to "trust" the SSL certificate of Google, so we need to create a new trust store for it.

Data import

First we need to create a CDS entity for storing the input data:

    // the main table, should store all the necessary data

    entity eInsurance { 

        key policyNumber:       String(32);

            personName:         String(128) not null;

            objectAddress:      String(256) not null;

            insuredSum:         Decimal(16,2) not null;

            objectLocation:     hana.ST_POINT(4326);

            objectGeocoded:     hana.TINYINT default 0;

    };

Then we can use a table import file to move the data from the CSV to the table itself:

import = [

    {

        cdstable  = "workspace.master.data::ctxSpatial.eInsurance";

        schema = "WORKSPACE_MASTER_SPATIAL";

        file = "workspace.master.data:eHouses.csv";

        header = false;

        delimField = ",";

        delimEnclosing="\""; 

    } 

];

Geocoding

Normally, we could use HANA's built-in functionality: the GEOCODE INDEX with a user-defined geocoding provider. Back when I wrote these apps, on an MDC I was not able to actually change the system configuration to use my own provider, so I decided to just call my function through an XS Job:

{

    "description": "Perform geocoding",

    "action": "workspace.master.service:job.xsjs::batch",

    "schedules": [{

        "xscron": "* * * * * * */5"

    }]

}

Roughly speaking, the following code is responsible for calling the Google API:

/**

 * Calls the google geocoding API.

 * @param   {string}    sApiKey     The Google Maps API key.

 * @param   {string}    sAddress    The address which should be geocoded.

 * @returns {object|null}   The response body returned by the service.

 * Null if the request failed.

 */

function callGeocodingApi(sApiKey, sAddress) {

    try {

        var sUrl = "/geocode/json?address=" + encodeURIComponent(sAddress) + "&key=" + sApiKey,

            oDestination = $.net.http.readDestination("workspace.master.service", "googleApi"),

            oClient = new $.net.http.Client(),

            oRequest = new $.net.http.Request($.net.http.GET, sUrl),

            oResponse = oClient.request(oRequest, oDestination).getResponse(),

            oData = JSON.parse(oResponse.body.asString());

        return oData;

    }

    catch (e) {

        $.trace.error(e);

        return null;

    }

}

To track which addresses were already processed, I also added a objectGeocoded flag to the CDS entity.

Visualisation

To view the points on a simple Google Map, we need to decompose the ST_POINT objects into the latitude and longitude components and expose them through an OData service.

For this we can simply create a simple CDS view:

    // the expanded view (expands the objectLocation into coords)

    // contains only the points which were geocoded succesfully

    view vExpandedInsurance as select from eInsurance{

        policyNumber,

        personName,

        insuredSum,

        objectAddress,

        objectLocation.ST_Y() as objectLatitude,

        objectLocation.ST_X() as objectLongitude

    } where eInsurance.objectGeocoded = 1;

And then add it to the xsodata service:

"workspace.master.data::ctxSpatial.vExpandedInsurance" as "ExpandedInsurances"

   keys ("policyNumber")

   create forbidden

   update forbidden

   delete forbidden;

Using a simple UI5 app, we get a fairly nice view of the points:

As a bonus, if we click on a point, we get more details about the insurance policy behind it.

Data acquisition

We don't want our customer to always use the CSV import. It should just be there for the initial "migration" of the data.

When new policies are created, we would like to help our customer in getting decent-quality data, so we build a small UI which shows exactly where an address is when creating the policy:

Technically, this UI just reuses the existing geocoding functionality from before.

Clustering

One very interesting use case would be to cluster up the insured objects. Think about natural disasters; they have the potential of affecting a large number of houses located in relative close proximity. For an insurance company, having a large lump of risk in the same area can easily spell disaster.

Because we are not sure how the customer would like to have it, we build three different prototypes supporting this.

UI-based clustering

The first one is simply using Google's Marker Clustering functionality. We don't really like it though, because it will surely not scale and it doesn't really use anything from HANA. Also, we don't get to know exactly what's the total amount of sum insured in each cluster (just the number of houses).

Static clustering

The second one assumes that the user does not need to input the number of clusters, but we can decide on it. Truth be told, this variant only exists because, back when I created the applications, using a calculation view input parameter in place of the hard-coded cluster count would result in an error.

We write a fairly simple calculation view for this and then expose it through OData:

var_out = SELECT ST_ClusterID() AS "id", SUM("insuredSum") AS "totalSum", 

	COUNT("insuredSum") AS "objectCount", ST_ClusterCentroid().ST_Y() AS "centerLatitude",

	ST_ClusterCentroid().ST_X() AS "centerLongitude"

	FROM (

		SELECT "insuredSum", "objectLocation".ST_Transform(1000004326) AS "objectLocation" 

			FROM "WORKSPACE_MASTER_SPATIAL"."workspace.master.data::ctxSpatial.eInsurance" 

			WHERE "objectLocation" IS NOT NULL

	)

	GROUP CLUSTER BY "objectLocation"

	USING KMEANS CLUSTERS 5;

As you can see, this uses the built-in KMEANS CLUSTER HANA spatial feature.

Dynamic clustering

We don't really like this either, because we hard-coded the cluster count. To work around the calculation view activation error, we move the code from the view into a simple XSJS script (with basically the same content, but with the cluster count parameterized). In the end, we obtain something that looks like this:

Aggregation

The customer might also want to select manually how to group the points together. To support this kind of scenario, we also create two different prototypes. For both prototypes, we want to:

Split all the points into two groups based on some kind of user input.

Display which points are in each group.

Aggregate the total sum insured per group.

Radius-based aggregation

In this prototype, we split the points into two groups based on their distance to a user selected point. If they are inside a circle of given radius and center, then they are in the "inside" group, otherwise they are "outside".

For determining this relationship, we use the following query that makes use of the ST_DISTANCE method:

 var_out = SELECT "policyNumber", 

	"objectLocation".ST_Y() AS "objectLatitude", 

	"objectLocation".ST_X() AS "objectLongitude", 

	CASE WHEN "objectLocation".ST_DISTANCE(NEW ST_POINT(

	    'POINT(' || :IV_LNG || ' ' ||  :IV_LAT || ')', 4326), 'meter') < :iv_radius 

	THEN 1 ELSE 0 END AS "group"

	FROM "WORKSPACE_MASTER_SPATIAL"."workspace.master.data::ctxSpatial.eInsurance"

	WHERE "objectLocation" IS NOT NULL;

Similarly, we build a view for the aggregated values. Here we can take into consideration the fact that we will always have two groups ("inside" and "outside"):

var_out = SELECT 1 AS "group", IFNULL(SUM("insuredSum"), 0) AS "totalSum", COUNT("policyNumber") AS "objectCount"

	FROM "WORKSPACE_MASTER_SPATIAL"."workspace.master.data::ctxSpatial.eInsurance"

	WHERE "objectLocation" IS NOT NULL AND "objectLocation".ST_DISTANCE(

		NEW ST_POINT('POINT(' || :IV_LNG || ' ' ||  :IV_LAT || ')', 4326), 'meter') < :iv_radius

UNION ALL

	SELECT 0 AS "group", IFNULL(SUM("insuredSum"), 0) AS "totalSum", COUNT("policyNumber") AS "objectCount"

	FROM "WORKSPACE_MASTER_SPATIAL"."workspace.master.data::ctxSpatial.eInsurance" 

	WHERE "objectLocation" IS NOT NULL AND "objectLocation".ST_DISTANCE(

		NEW ST_POINT('POINT(' || :IV_LNG || ' ' ||  :IV_LAT || ')', 4326), 'meter') >= :iv_radius;

Then we just display these two sets of data on a simple "picker" UI:

Region-based aggregation

The second version is to actually let the user draw the region as a polygon. Thankfully, we can use the built-in ST_CoveredBy predicate to work this out:

var_out = SELECT "policyNumber", "objectLocation".ST_Y() AS "objectLatitude", 

	"objectLocation".ST_X() AS "objectLongitude",

	"objectLocation".ST_CoveredBy(NEW ST_POLYGON(:iv_polygon, 4326)) AS "group"

	FROM "WORKSPACE_MASTER_SPATIAL"."workspace.master.data::ctxSpatial.eInsurance"

	WHERE "objectLocation" IS NOT NULL;

The polygon itself is an input variable to this calculation view. The map includes a set of drawing tools which also support polygons. Using this functionality we can build a simple UI for this prototype as well:

Conclusion

With a fairly low amount of coding, we can easily build interactive insight tools based on the HANA Spatial functionalities. I hope you also find the examples cool!