Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
Vitaliy-R
Developer Advocate
Developer Advocate
In part 1 we requested and loaded Location History from a personal account in Google. There are too many single points (recorded locations) to efficiently visualize collected data. Let's have a look at the data using geospatial clustering available in SAP HANA 2.0 SPS 03.

Let's continue with exercises.

Exercise 2: K-means clustering


K-means tries to find an assignment of points to clusters, so that the sum of squared distances of the points to the center of the cluster they belong to is minimal.

Let's have a look at high-level clusters of all my locations for the complete history.
--KMeans
select ST_UnionAggr("ConvexHull") from
(select ST_CLUSTERID() AS "CID",
ST_CONVEXHULLAGGR("POINT").st_srid(4326) AS "ConvexHull",
COUNT(*) AS "Number of points in this cluster"
from "GOOGLE"."LOC_HISTORY"
GROUP CLUSTER BY "POINT"
USING KMEANS CLUSTERS 3
order by 3 desc);

Points have been roughly clustered into Central and Western Europe, Eastern Europe and Asia, the USA (using GeoJson.io to visualize the SQL output).


Exercise 3: Grid clustering


Grid clustering provides a quick and easy way to use clustering. It is useful for providing a first impression.
--Grid
select ST_UnionAggr("Envelope") from
(
select ST_CLUSTERID() AS "CID",
ST_CLUSTERENVELOPE() AS "Envelope",
COUNT(*) AS "Number of points in this cluster"
from "GOOGLE"."LOC_HISTORY"
GROUP CLUSTER BY "POINT"
USING GRID X CELLS 160 Y CELLS 80
order by 3 desc
);

This time the grid clustering will give us better overview of the geography of recorded locations.



So, it is not like I've traveled half of Asia, but only visited Bengaluru and Istanbul 🙂

Exercise 4: DBSCAN clustering


Density-based spatial clustering of applications with noise (DBSCAN) is best suited to non-spherical clusters. It should works nicely if I want to cluster points along the routs I've been taking within 450 kilometers from my home town Wrocław (the center of the city is roughly 51.11N 17.035E).
--DBSCAN
SELECT st_unionAggr("cluster").st_asWKB() from (
SELECT "cluster_id", st_unionAggr("POINT").ST_AlphaShape(0.055) as "cluster"
from (
SELECT
ST_ClusterID() OVER (CLUSTER BY "POINT" USING DBSCAN EPS 0.001 MINPTS 6) AS "cluster_id",
"POINT"
FROM "GOOGLE"."LOC_HISTORY"
WHERE "POINT".ST_SRID(4326).ST_Distance(NEW ST_Point(17.035, 51.11).st_srid(4326),'kilometer') < 450
ORDER BY 1
)
where "cluster_id" <> 0
group by "cluster_id");

And here is the result.



You can play with different parameters of DBSCAN clustering and ST_AlphaShape() to get the best shape for visualization of your data.




Maybe next time I should look at the My Places dataset from Google Takeout.

‘Till next #GeospatialTuesday then,
-Vitaliy, aka @Sygyzmundovych
3 Comments