In my previous blog I promised exploration of Big Data on SAP HANA, express edition. But remember, the Big Data is not only about the Volume, but about Variety (of data types) as well. And this is the route I chose first to look at the fun stuff you can do with spatial data processing in SAP HANA.
Ever since I enjoyed “Calculating Pi with Darts” video from Physics Girl and Veritasium [which you should watch too!] I have thought about repeating it. The world is going digital, so obviously I meant using SAP HANA for that. I know I should have done it during the PI Day (3/14/16, or on the 14th of March 2016), but better later than never!
Calculating pi number with darts is one of the Monte Carlo methods of getting its approximate value. Accordingly to Wikipedia “[…] method for computing π is to draw a circle inscribed in a square, and randomly place dots in the square. The ratio of dots inside the circle to the total number of dots will approximately equal π/4”
Looked like SAP HANA’s spatial capabilities would fit perfectly for that. If you are not familiar with spatial processing I prepared four introductory tutorials that should not take more than 20 minutes for your to complete and understand all basic concepts needed to follow the rest of the blog. And if you do not have SAP HANA Express yet, then it is 10 minutes to get it. Alternatively you can use as well SAP HANA MDC instance in your HCP Trial account as we are still not talking about huge volumes of data here.
- Points: http://www.sap.com/developer/tutorials/hana-spatial-intro1-point.html
- Lines and strings: http://www.sap.com/developer/tutorials/hana-spatial-intro2-string.html
- Areas and polygons: http://www.sap.com/developer/tutorials/hana-spatial-intro3-polygon.html
- Spatial columns in tables: http://www.sap.com/developer/tutorials/hana-spatial-intro4-columns.html
Virtual dart hits are points with random X and Y coordinates (objects ST_Point). The dartboard is a disk (ST_Buffer() around a ring’s center point). And then calculation of the average of hits within an area of the disk (ST_Within() method).
First I need a table with a spatial column, which will store coordinates of my digital hits, plus a procedure to populate this table with required number of attempts.
CREATE SCHEMA "TESTSGEO"; SET SCHEMA "TESTSGEO"; --DROP TABLE "TESTSGEO"."SPATIAL_CALCPI"; CREATE COLUMN TABLE SPATIAL_CALCPI( POINT ST_POINT ); --DROP PROCEDURE "TESTSGEO"."COLLECT_HITS"; CREATE PROCEDURE collect_hits (IN attempts INT) LANGUAGE SQLSCRIPT AS iter INTEGER; BEGIN iter := 1; WHILE iter<=attempts DO INSERT INTO "TESTSGEO"."SPATIAL_CALCPI" VALUES (new st_point(RAND(), RAND())); iter := iter+1; END WHILE; MERGE DELTA OF "TESTSGEO"."SPATIAL_CALCPI"; END;
Now let’s check the result of throwing 2000 virtual darts and what the PI number approximation will be!
--TRUNCATE TABLE "TESTSGEO"."SPATIAL_CALCPI"; CALL "TESTSGEO"."COLLECT_HITS"(ATTEMPTS => 2000); --Check the results of throwing: coordinates and if hit dartboard SELECT POINT.ST_asWKT(), POINT.ST_Within(NEW ST_Point(0.5,0.5).ST_Buffer(0.5)) as IN_CIRCLE FROM "TESTSGEO"."SPATIAL_CALCPI"; --Calculating PI using Monte Carlo formula SELECT 4*AVG(POINT.ST_Within(NEW ST_Point(0.5,0.5).ST_Buffer(0.5))) as PI FROM "TESTSGEO"."SPATIAL_CALCPI";
Results I got in my system were between 3.11 and 3.21. Well, very rough approximation of number π 🙂
Let’s visualize the results by generating SVG with a dartboard and all generated hits.
SELECT ST_UnionAggr(POINT).ST_Union(NEW ST_CircularString('CIRCULARSTRING(0 0.5, 1 0.5, 0 0.5)')).ST_asSVG() AS DARTBOARD FROM "TESTSGEO"."SPATIAL_CALCPI";
I did a minor modification of the SVG to have a circle in red.
Then I tried 50000 attempts, but the result was 3.1168. So, no much improvement over previous attempts.
PS. Obviously using below SAP HANA spatial method calculating a circle’s circumference when diameter is 1 would be much faster and precise way to get the pi. But – hey! – it would take away all the fun of throwing digital darts 😉
SELECT NEW ST_CircularString ('CircularString (0 0.5, 0 1.5, 0 0.5)').ST_Length() as PI FROM DUMMY; --Result is PI 3.141592653589793
Please let me know what pi numbers you got by throwing digital darts in your SAP HANA instances.