I wanted to share with the community one of my personal projects. I had been thinking about how to do physics analysis on SAP HANA. To start off, I did a simple PoC, following these steps.
I need High Energy Particle Physics data. I wish I could get the real data from a collder, but for a simple PoC, I settled for data from a Physics event generator. I chose Pythia 8.1 as it is one of the most widely used in Physics community. I set Pythia command file to collide p+ with p+ at a CM energy of 1.400e+04 GeV, for 100,000 times (or ‘events’ in Physicist’s term). I first tried with 1M events but that totally bombed my machine.
2. Interface with SAP HANA
I modified a C++ example analysis program (main03.cc) included in Pythia 8.1 package to output generated events to data.csv on a HANA server, which is then imported into a table for analysis.
3. Analysis in SAP HANA
I wrote a procedure, which is equivalent to C++ code in main03.cc but in SQL Script. The sql is shown below.
create procedure testPythia ( out epDev doubleTab ) language sqlscript reads sql data as begin t_main = select * from "SYSTEM"."Pythia" ; t_main_with_rownum = CE_PROJECTION(:t_main, [CE_CALC('rownum()', integer) as rownum, "tt" as e] ); t_ievent_with_rownum = CE_PROJECTION(:t_main, [CE_CALC('rownum()', integer) as rownum, "ievent" ] ); t_rownum_of_firstrow_in_ievent = select "ievent", min(rownum) as firstrownum from :t_ievent_with_rownum group by "ievent"; t_first_e_of_each_event = select a."ievent", b.e as e from :t_rownum_of_firstrow_in_ievent as a inner join :t_main_with_rownum as b on a.firstrownum = b.rownum; t_ievent_sumpxyz = select "ievent", sum("px") as sumpx, sum("py") as sumpy, sum("pz") as sumpz, sum("tt") as sume from :t_main where "isfinal" = 1 group by "ievent"; t_ievent_sumpxyz_with_first_e = select a."ievent", a.e as "firstE", b.sumpx, b.sumpy, b.sumpz, b.sume from :t_first_e_of_each_event as a inner join :t_ievent_sumpxyz as b on a."ievent" = b."ievent"; t_ievent_sumxyz_div_by_first_e = select "ievent", (sumpx/"firstE") as pxdiv, (sumpy/"firstE") as pydiv, (sumpz/"firstE") as pzdiv, (sume/"firstE") as ediv from :t_ievent_sumpxyz_with_first_e; t_epdev = select abs(ediv - 1) as e, abs(pxdiv) as px , abs(pydiv)as py, abs(pzdiv) as pz from :t_ievent_sumxyz_div_by_first_e; epDev = select e+px+py+pz as epDev from :t_epdev; end;
The calculation is a simple one but it’s good enough for illustration purpose. I did not attempt to optimize the procedure. The server processing time is 2:00.711 minutes for this analysis of 100,000 events and 189,505,757 particle records.
Using this same setup, the next steps would be to change the beam energy in Pythia to 6.5 TeV per beam to match the planned beam energy of the LHC for 2015. More importantly, I need to find a more meaningful analysis that also better showcases the power of SAP HANA. I welcome your suggestion.