Skip to Content
Author's profile photo Former Member

Feeling the Earth Move with HADOOP & HANA

I’ve been Inspired by Cloudera’s example of using Hadoop to collect & collate seismic information, HANA’s recent Geo-spatial improvements, and the geographic mapping capability of Mike Bostocks amazing D3.  

I thought  it would be interesting to combine these powerful tools to make an End to End example using:

1)  Hadoop to collect seismic information

2)  HANA to graphical present the data using HANA XS, SAPUI5  & D3.


The following example was built using a Hortonworks HDP2.0 Cluster & HANA SPS7 both running on AWS.


The final results were as follows:


The controls are provided by SAPUI5, and the rotatable globe is made using D3.  This was developed with approximately 800 lines of code.  See below for full code details.


As a brief example of this in action you can watch this short video:






Before I could present the information I firstly used HADOOP to automate the following:

a)  Collect the data from Earthquake Hazards Program

b)  Reformat the data and determine DELTA’s, since the last run.

c)  Export to HANA

d)  Execute a HANA procedure to update Geo-spatial Location information.

Below is diagram of the HADOOP tools I used:

Each of the steps are summarised below.  They where scheduled in one workflow using HADOOP OOZIE.


a) Get DATA:  I used  Cloudera’s JAVA example to collect recent seismic info  cloudera/earthquake · GitHub

      Note: I modified Cloudera example slightly in order to get place information relating to the quake.

     AronMacDonald/earthquake · GitHub


      The source data is supplied by the US geographical Survey  Earthquake Archive Search & URL Builder



b) Pig Scripts

          1) Reformat the data into TAB delimited files (easier for importing text to HANA)

          2) Prepare a Delta file, comparing data previously send to HANA with new data

      [Note: for a simplified version of using PIG with HANA see Using HADOOP PIG to feed HANA Deltas]


     The pig scripts I created for this more complex example are available at  AronMacDonald/Quake_PIG · GitHub


c) SQOOP  was used to export the delta records to HANA

      [Note: for an overview of using SQOOP with HANA see Exporting and Importing DATA  to HANA with HADOOP SQOOP]

    

     The sqoop export statement for this tab delimited file was:

      sqoop export -D sqoop.export.records.per.statement=1 –username SYSTEM –password manager

     –connect jdbc:sap://zz.zz.zz.zzz:30015/ –driver com.sap.db.jdbc.Driver  –table HADOOP.QUAKES

     –input-fields-terminated-by ‘\t’ –export-dir /user/admin/quakes/newDelta


    The target table in HANA is:

create column table quakes (

     time      timestamp,

     latitude  decimal(10,5),

     longitude decimal(10,5),

     depth     decimal(7,4),

     mag       decimal(4,2),

     magType   nvarchar(10),

     nst       integer,

     gap       decimal(7,4),

     dmin      decimal(12,8),

     rms       decimal(7,4),

     net       nvarchar(10),

     id        nvarchar(30),

     updated   timestamp,

     place     nvarchar(150),

     type      nvarchar(50)


);



d) Execute a HANA procedure (from HADOOP) to populate geospatial location information for the new records

      [Note: For a simplified example of calling Hana procedures from HADOOP see Creating a HANA Workflow using HADOOP Oozie]

   

Geospatial information is stored in the following table in HANA:

create column table quakes_geo (

     id        nvarchar(30),

     location ST_POINT  

);

      In order to populate the locations a Hana procedure (populateQuakeGeo.hdbprocedure) was created which performs the following statement:

   insert into HADOOP.QUAKES_GEO

       (select Q.id, new ST_Point( Q.longitude , Q.latitude ) 

        from HADOOP.QUAKES as Q 

        left outer join HADOOP.QUAKES_GEO as QG

        ON Q.id = QG.id

        where QG.id is null );

Finally an  Oozie workflow was created for the above steps on a Hortonworks HDP 2.0 cluster.

An example of the execution log in the Hadoop User interface (HUE) is:


I then got to work building the HTML5 webpage on HANA XS.

These were the main references I used for building the D3 rotating Globe:

Rotating Orthographic

Rotate the World

Current Global Earthquakes

To serve up the quake information, which can be easily consumed by D3 [geojson], a custom server side Javascript (quakeLocation.xsjs) was created.

The basis of the geojson output was the following statement, which used the SAPUI5 controls for Date range and Quake magnitude:

select Q.id, Q.mag, Q.place, Q.time, QG.location.ST_AsGeoJSON() as “GeoJSON” 

from HADOOP.QUAKES as Q 

left outer join HADOOP.QUAKES_GEO as QG

ON Q.id = QG.id

where QG.id is not null

For a simplified version of Using D3 with HANA , including an example of how to create XSJS geojson, then see Serving up Apples & Pears: Spatial Data and D3

The complete HANA XS Project (including above mentioned XSJS, Prodedure and HTML5 source code) is available to download here:

HadoopQuakes.zip – Google Drive


I hope you found this example interesting and I hope it inspires you to automate your HADOOP HANA workflows with OOZIE, as well as exploring the graphical visualisation capabilities of SAPUI5 & D3.

Assigned Tags

      3 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Kevin Small
      Kevin Small

      Very impressive Aron, great work.  I didn't know D3 could do so much with so little code.

      Author's profile photo Former Member
      Former Member
      Blog Post Author

      Thanks Kevin.  Glad you liked it.  🙂

      Author's profile photo Former Member
      Former Member
      Blog Post Author

      Just saw another amazing D3 Orthographic Earth showing near real-time wind current measurements shown as animations (transition effects).


      http://earth.nullschool.net/



      I doubt this example was built on HANA but wouldn't be hard to replicate. You just need a suitable large data-set to to make it interesting.