Skip to Content
Personal Insights

Your location history processed with SAP HANA – part 2: geospatial clustering

In part 1 we requested and loaded Location History from a personal account in Google. There are too many single points (recorded locations) to efficiently visualize collected data. Let’s have a look at the data using geospatial clustering available in SAP HANA 2.0 SPS 03.

Let’s continue with exercises.

Exercise 2: K-means clustering

K-means tries to find an assignment of points to clusters, so that the sum of squared distances of the points to the center of the cluster they belong to is minimal.

Let’s have a look at high-level clusters of all my locations for the complete history.

--KMeans
select ST_UnionAggr("ConvexHull") from
(select ST_CLUSTERID() AS "CID",
	ST_CONVEXHULLAGGR("POINT").st_srid(4326) AS "ConvexHull",
	COUNT(*) AS "Number of points in this cluster"
 from "GOOGLE"."LOC_HISTORY"
 GROUP CLUSTER BY "POINT"
 USING KMEANS CLUSTERS 3
order by 3 desc);

Points have been roughly clustered into Central and Western Europe, Eastern Europe and Asia, the USA (using GeoJson.io to visualize the SQL output).

Exercise 3: Grid clustering

Grid clustering provides a quick and easy way to use clustering. It is useful for providing a first impression.

--Grid
select ST_UnionAggr("Envelope") from
(
select ST_CLUSTERID() AS "CID", 
	ST_CLUSTERENVELOPE() AS "Envelope",
	COUNT(*) AS "Number of points in this cluster"
 from "GOOGLE"."LOC_HISTORY"
 GROUP CLUSTER BY "POINT" 
USING GRID X CELLS 160 Y CELLS 80
order by 3 desc
);

This time the grid clustering will give us better overview of the geography of recorded locations.

So, it is not like I’ve traveled half of Asia, but only visited Bengaluru and Istanbul 🙂

Exercise 4: DBSCAN clustering

Density-based spatial clustering of applications with noise (DBSCAN) is best suited to non-spherical clusters. It should works nicely if I want to cluster points along the routs I’ve been taking within 450 kilometers from my home town Wrocław (the center of the city is roughly 51.11N 17.035E).

--DBSCAN
SELECT st_unionAggr("cluster").st_asWKB() from (
SELECT "cluster_id", st_unionAggr("POINT").ST_AlphaShape(0.055) as "cluster"
from (
SELECT
  ST_ClusterID() OVER (CLUSTER BY "POINT" USING DBSCAN EPS 0.001 MINPTS 6) AS "cluster_id",
  "POINT"
FROM "GOOGLE"."LOC_HISTORY"
WHERE "POINT".ST_SRID(4326).ST_Distance(NEW ST_Point(17.035, 51.11).st_srid(4326),'kilometer') < 450
ORDER BY 1
)
where "cluster_id" <> 0
group by "cluster_id");

And here is the result.

You can play with different parameters of DBSCAN clustering and ST_AlphaShape() to get the best shape for visualization of your data.


Maybe next time I should look at the My Places dataset from Google Takeout.

‘Till next #GeospatialTuesday then,
-Vitaliy, aka @Sygyzmundovych

3 Comments
You must be Logged on to comment or reply to a post.