Feeling the Earth Move with HADOOP & HANA
I’ve been Inspired by Cloudera’s example of using Hadoop to collect & collate seismic information, HANA’s recent Geo-spatial improvements, and the geographic mapping capability of Mike Bostocks amazing D3.
I thought it would be interesting to combine these powerful tools to make an End to End example using:
1) Hadoop to collect seismic information
2) HANA to graphical present the data using HANA XS, SAPUI5 & D3.
The following example was built using a Hortonworks HDP2.0 Cluster & HANA SPS7 both running on AWS.
The final results were as follows:
The controls are provided by SAPUI5, and the rotatable globe is made using D3. This was developed with approximately 800 lines of code. See below for full code details.
As a brief example of this in action you can watch this short video:
Before I could present the information I firstly used HADOOP to automate the following:
a) Collect the data from Earthquake Hazards Program
b) Reformat the data and determine DELTA’s, since the last run.
c) Export to HANA
d) Execute a HANA procedure to update Geo-spatial Location information.
Below is diagram of the HADOOP tools I used:
Each of the steps are summarised below. They where scheduled in one workflow using HADOOP OOZIE.
a) Get DATA: I used Cloudera’s JAVA example to collect recent seismic info cloudera/earthquake · GitHub
Note: I modified Cloudera example slightly in order to get place information relating to the quake.
AronMacDonald/earthquake · GitHub
The source data is supplied by the US geographical Survey Earthquake Archive Search & URL Builder
b) Pig Scripts
1) Reformat the data into TAB delimited files (easier for importing text to HANA)
2) Prepare a Delta file, comparing data previously send to HANA with new data
[Note: for a simplified version of using PIG with HANA see Using HADOOP PIG to feed HANA Deltas]
The pig scripts I created for this more complex example are available at AronMacDonald/Quake_PIG · GitHub
c) SQOOP was used to export the delta records to HANA
[Note: for an overview of using SQOOP with HANA see Exporting and Importing DATA to HANA with HADOOP SQOOP]
The sqoop export statement for this tab delimited file was:
sqoop export -D sqoop.export.records.per.statement=1 –username SYSTEM –password manager
–connect jdbc:sap://zz.zz.zz.zzz:30015/ –driver com.sap.db.jdbc.Driver –table HADOOP.QUAKES
–input-fields-terminated-by ‘\t’ –export-dir /user/admin/quakes/newDelta
The target table in HANA is:
create column table quakes (
time timestamp,
latitude decimal(10,5),
longitude decimal(10,5),
depth decimal(7,4),
mag decimal(4,2),
magType nvarchar(10),
nst integer,
gap decimal(7,4),
dmin decimal(12,8),
rms decimal(7,4),
net nvarchar(10),
id nvarchar(30),
updated timestamp,
place nvarchar(150),
type nvarchar(50)
);
d) Execute a HANA procedure (from HADOOP) to populate geospatial location information for the new records
[Note: For a simplified example of calling Hana procedures from HADOOP see Creating a HANA Workflow using HADOOP Oozie]
Geospatial information is stored in the following table in HANA:
create column table quakes_geo (
id nvarchar(30),
location ST_POINT
);
In order to populate the locations a Hana procedure (populateQuakeGeo.hdbprocedure) was created which performs the following statement:
insert into HADOOP.QUAKES_GEO
(select Q.id, new ST_Point( Q.longitude , Q.latitude )
from HADOOP.QUAKES as Q
left outer join HADOOP.QUAKES_GEO as QG
ON Q.id = QG.id
where QG.id is null );
Finally an Oozie workflow was created for the above steps on a Hortonworks HDP 2.0 cluster.
An example of the execution log in the Hadoop User interface (HUE) is:
I then got to work building the HTML5 webpage on HANA XS.
These were the main references I used for building the D3 rotating Globe:
To serve up the quake information, which can be easily consumed by D3 [geojson], a custom server side Javascript (quakeLocation.xsjs) was created.
The basis of the geojson output was the following statement, which used the SAPUI5 controls for Date range and Quake magnitude:
select Q.id, Q.mag, Q.place, Q.time, QG.location.ST_AsGeoJSON() as “GeoJSON”
from HADOOP.QUAKES as Q
left outer join HADOOP.QUAKES_GEO as QG
ON Q.id = QG.id
where QG.id is not null
For a simplified version of Using D3 with HANA , including an example of how to create XSJS geojson, then see Serving up Apples & Pears: Spatial Data and D3
The complete HANA XS Project (including above mentioned XSJS, Prodedure and HTML5 source code) is available to download here:
HadoopQuakes.zip – Google Drive
I hope you found this example interesting and I hope it inspires you to automate your HADOOP HANA workflows with OOZIE, as well as exploring the graphical visualisation capabilities of SAPUI5 & D3.
Very impressive Aron, great work. I didn't know D3 could do so much with so little code.
Thanks Kevin. Glad you liked it. 🙂
Just saw another amazing D3 Orthographic Earth showing near real-time wind current measurements shown as animations (transition effects).
http://earth.nullschool.net/
I doubt this example was built on HANA but wouldn't be hard to replicate. You just need a suitable large data-set to to make it interesting.