Skip to Content
Personal Insights

Your location history processed with SAP HANA – part 1

It all started with this question from Gregor Wolf:

“.@Sygyzmundovych have you ever tried to import the Google Location History from https://takeout.google.com/settings/takeout … into #HANA Geospacial?”

Well, in fact, I haven’t. But it turned out to be a good exercise to use two of SAP HANA’s multi-model processing capabilities: geospatial and JSON. As usually, I am using free SAP HANA, express edition (HXE) here.

Requesting and uploading data

I have a private Google account, so I requested Location History and My Places (places I rated on Google Maps) data:

  • My Places are available as GeoJSON data, so it can be a topic for another post.
  • Location History is available either as a plain JSON or KML geospatial format. I am going to request it in JSON format.

Once received requested data I downloaded it to my HXE’s /usr/sap/HXE/HDB90/work/ folder and uncompressed. Let’s look at the data structure.

hxeadm@hxehost:/usr/sap/HXE/HDB90/work/Takeout/Location History> head -20 Location\ History.json 
{
  "locations" : [ {
    "timestampMs" : "1550519193999",
    "latitudeE7" : 510754802,
    "longitudeE7" : 169642113,
    "accuracy" : 8,
    "velocity" : 0,
    "altitude" : 121,
    "verticalAccuracy" : 3
  }, {
    "timestampMs" : "1550519070999",
    "latitudeE7" : 510801796,
    "longitudeE7" : 169665475,
    "accuracy" : 8,
    "velocity" : 9,
    "heading" : 215,
    "altitude" : 123,
    "verticalAccuracy" : 3
  }, {
...

Fields I am interested in are:

  1. timestampMs is a UTC timestamp in ms using UNIX Epoch time representation, i.e. "1550519193999" is February 18, 2019 7:46:33.999 PM GMT,
  2. latitudeE7 is a latitude multiplied by 10,000,000 (or 1e7) to represent it as an integer number, i.e. "510754802" is 51,0754802 value in a GPS position,
  3. longitudeE7 is a longitude multiplied by 10,000,000.

Loading data into SAP HANA

We can parse data before loading it in the database. But loading it in the original format will be a good example of processing JSON with SQL in SAP HANA.

CREATE SCHEMA "GOOGLE";

CREATE COLUMN TABLE "GOOGLE"."DUMP"(
	"DATA" CLOB NOT NULL
);

IMPORT FROM CSV FILE '/usr/sap/HXE/HDB90/work/Takeout/Location History/Location History.json' INTO "GOOGLE"."DUMP"
   WITH RECORD DELIMITED BY '\r'
   FIELD DELIMITED BY '\t';

It is not a CSV file, but I pretended it is – with one record and with one field. That’s why I picked delimiters that are not present in the file.

Convert JSON to relational data using SQL

Next, I want to convert JSON into records of a LOC_HISTORY table with a timestamp and a geographical position represented by ST_POINT data type.

CREATE TABLE "GOOGLE"."LOC_HISTORY" AS
(SELECT 
	TO_SECONDDATE(ADD_SECONDS('1970-01-01 00:00:00', "TSMS" / 1000)) as "TIMESTAMP",
	new ST_Point("LONE7"/1e7, "LATE7"/1e7) as "POINT"
FROM JSON_TABLE("GOOGLE"."DUMP"."DATA", '$.locations[*]'
COLUMNS
    (
        RN FOR ORDINALITY,
        TSMS BIGINT PATH '$.timestampMs',
        LATE7 BIGINT PATH '$.latitudeE7',
        LONE7 BIGINT PATH '$.longitudeE7',
        ALT INT PATH '$.altitude'
    )
)
);

I used JSON_TABLE() function. Please note, that I am not using HANA’s Document store for this exercise, but only relational engine and its SQL.

I can see data in the table now. My location data is available from 2013 till yesterday.

Exercise 1: Reconstructing the walk

Let’s see. Last Sunday, February 17th, I went on my Instagram hunt between 10:27 and 11:42 local time (CET time zone), while my children were attending UniKids classes. I just need to rework a procedure from my earlier blog Calculating a path from IoT GPS data stored as ST_Point.

CALL "GOOGLE"."PATH_FROM_POINTS"(
	TS_START => LOCALTOUTC(To_TimeStamp('2019-02-17 10:27:00'), 'CET'),
	TS_END => LOCALTOUTC(To_TimeStamp('2019-02-17 11:42:00'), 'CET'),
	PATH => ?
);

And the result in WKB format is:

SRID=4326;LINESTRING (17.06334400177002 51.11322212219238,17.064226150512695 51.1140832901001,17.066997528076172 51.115102767944336,17.063793182373047 51.11575794219971,17.066638946533203 51.11593723297119,17.06586265563965 51.11723232269287,17.062926292419434 51.11690616607666,17.05934238433838 51.1182861328125,17.057364463806152 51.1193790435791,17.056079864501953 51.120232582092285,17.054750442504883 51.12098693847656,17.05545997619629 51.12276363372803,17.056791305541992 51.12459754943848,17.05857276916504 51.12675952911377,17.05574607849121 51.12709903717041,17.054173469543457 51.12527084350586,17.052987098693848 51.12366008758545,17.052385330200195 51.12179660797119,17.053675651550293 51.119704246520996,17.056140899658203 51.11831569671631,17.057037353515625 51.11619281768799,17.060548782348633 51.11597728729248,17.06048011779785 51.11384963989258,17.06330394744873 51.11301517486572)

This can be easily visualised using GeoJson.io website:

Location History stores the location roughly every 3 minutes, so the line is not smooth on the visualization, but reflects the path I did if comparing to the map generated on Endomondo data for the same walk:

 


In the second part of this blog post I do some spatial clustering on this data set.

‘Till next #GeospatialTuesday then,
-Vitaliy, aka @Sygyzmundovych

3 Comments
You must be Logged on to comment or reply to a post.
  • Hi Witalij,

    finally I found some time and need  to try out your blog. I’m trying to use the HDI Container available in the SAP Cloud Platfrom Cloud Foundry trial for that. First  I converted the JSON into a csv for easier import into HANA:

    cat Standortverlauf.json | jq -r '.locations | (map(keys) \
    | add | unique) as $cols | map(. as $row | $cols | map($row[.])) as $rows \
    | $cols, $rows[] | @csv' > Standortverlauf.csv

    As I don’t have access to a file system local to the HANA DB I’ve installed the SAP Data Hub developer edition and created this Graph which reads the CSV in 1MB chunks and the HANA client processes 100 rows at a time:

    Using that my 2.032.779 rows where imported in just about 15 minutes: 

    And I could start the conversion into Points

    CREATE COLUMN TABLE LOC_HISTORY AS
    (SELECT 
    	TO_SECONDDATE(ADD_SECONDS('1970-01-01 00:00:00', "TIMESTAMPMS" / 1000)) as "TIMESTAMP",
    	new ST_Point("LONGITUDEE7"/1e7, "LATITUDEE7"/1e7) as "POINT"
    FROM LOC_HISTORY_IMPORT);

    which took only took 9 seconds and now my complete location history from January 2011 to February 2019 is in the LOC_HISTORY table:

    But it seems that the HANA Database used for the HDI container are not yet on SAP HANA 2.0 SP 3. Because I get the following error when I try to create the procedure:

    Could not execute 'CREATE PROCEDURE "PATH_FROM_POINTS" (OUT path CLOB, IN ts_start NVARCHAR(27) DEFAULT '0001-01-01 ...'
    
    Error: (dberror) [328]: invalid name of function or procedure: ST_ADDPOINT: line 20 col 26 (at pos 608)

    the source I’m using is:

    CREATE PROCEDURE "PATH_FROM_POINTS" 
    (OUT path CLOB, 
     IN ts_start NVARCHAR(27) DEFAULT '0001-01-01 00:00:00.0000000', 
     IN ts_end NVARCHAR(27) DEFAULT '9999-12-31 23:59:59.9999999'
    )
     LANGUAGE SQLSCRIPT 
     READS SQL DATA 
     AS
     BEGIN
     	DECLARE path_temp ST_GEOMETRY;
        DECLARE CURSOR c_points FOR
           	    SELECT "POINT" FROM "LOC_HISTORY"
           	    WHERE "TIMESTAMP" BETWEEN To_TimeStamp(ts_start) AND To_TimeStamp(ts_end)
           		ORDER BY "TIMESTAMP";
           		
     	SELECT ST_GeomFromEWKT('SRID=4326; POINT EMPTY') INTO path_temp FROM dummy;
    
        FOR cur_row AS c_points
    	DO
    		path_temp := :path_temp.ST_AddPoint(cur_row."LOC_4326", -1);
        END FOR;
    
    	path := path_temp.ST_asEWKT();
     END;

    Maybe you have a tip there.

    Best regards
    Gregor