In part 1 we requested and loaded Location History from a personal account in Google. There are too many single points (recorded locations) to efficiently visualize collected data. Let’s have a look at the data using geospatial clustering available in SAP HANA 2.0 SPS 03.
Let’s continue with exercises.
Exercise 2: K-means clustering
K-means tries to find an assignment of points to clusters, so that the sum of squared distances of the points to the center of the cluster they belong to is minimal.
Let’s have a look at high-level clusters of all my locations for the complete history.
--KMeans select ST_UnionAggr("ConvexHull") from (select ST_CLUSTERID() AS "CID", ST_CONVEXHULLAGGR("POINT").st_srid(4326) AS "ConvexHull", COUNT(*) AS "Number of points in this cluster" from "GOOGLE"."LOC_HISTORY" GROUP CLUSTER BY "POINT" USING KMEANS CLUSTERS 3 order by 3 desc);
Points have been roughly clustered into Central and Western Europe, Eastern Europe and Asia, the USA (using GeoJson.io to visualize the SQL output).
Exercise 3: Grid clustering
Grid clustering provides a quick and easy way to use clustering. It is useful for providing a first impression.
--Grid select ST_UnionAggr("Envelope") from ( select ST_CLUSTERID() AS "CID", ST_CLUSTERENVELOPE() AS "Envelope", COUNT(*) AS "Number of points in this cluster" from "GOOGLE"."LOC_HISTORY" GROUP CLUSTER BY "POINT" USING GRID X CELLS 160 Y CELLS 80 order by 3 desc );
This time the grid clustering will give us better overview of the geography of recorded locations.
So, it is not like I’ve traveled half of Asia, but only visited Bengaluru and Istanbul 🙂
Exercise 4: DBSCAN clustering
Density-based spatial clustering of applications with noise (DBSCAN) is best suited to non-spherical clusters. It should works nicely if I want to cluster points along the routs I’ve been taking within 450 kilometers from my home town Wrocław (the center of the city is roughly 51.11N 17.035E).
--DBSCAN SELECT st_unionAggr("cluster").st_asWKB() from ( SELECT "cluster_id", st_unionAggr("POINT").ST_AlphaShape(0.055) as "cluster" from ( SELECT ST_ClusterID() OVER (CLUSTER BY "POINT" USING DBSCAN EPS 0.001 MINPTS 6) AS "cluster_id", "POINT" FROM "GOOGLE"."LOC_HISTORY" WHERE "POINT".ST_SRID(4326).ST_Distance(NEW ST_Point(17.035, 51.11).st_srid(4326),'kilometer') < 450 ORDER BY 1 ) where "cluster_id" <> 0 group by "cluster_id");
And here is the result.
Maybe next time I should look at the My Places dataset from Google Takeout.