Feeling the Earth Move with HADOOP & HANA

Former Member · ‎03-10-2014

I've been Inspired by Cloudera's example of using Hadoop to collect & collate seismic information, HANA's recent Geo-spatial improvements, and the geographic mapping capability of Mike Bostocks amazing D3.

I thought it would be interesting to combine these powerful tools to make an End to End example using:

1) Hadoop to collect seismic information

2) HANA to graphical present the data using HANA XS, SAPUI5 & D3.

The following example was built using a Hortonworks HDP2.0 Cluster & HANA SPS7 both running on AWS.

The final results were as follows:

The controls are provided by SAPUI5, and the rotatable globe is made using D3. This was developed with approximately 800 lines of code. See below for full code details.

As a brief example of this in action you can watch this short video:

Before I could present the information I firstly used HADOOP to automate the following:

a) Collect the data from Earthquake Hazards Program

b) Reformat the data and determine DELTA's, since the last run.

c) Export to HANA

d) Execute a HANA procedure to update Geo-spatial Location information.

Below is diagram of the HADOOP tools I used:

Each of the steps are summarised below. They where scheduled in one workflow using HADOOP OOZIE.

a) Get DATA: I used Cloudera's JAVA example to collect recent seismic info cloudera/earthquake · GitHub

Note: I modified Cloudera example slightly in order to get place information relating to the quake.

AronMacDonald/earthquake · GitHub

The source data is supplied by the US geographical Survey Earthquake Archive Search & URL Builder

b) Pig Scripts

1) Reformat the data into TAB delimited files (easier for importing text to HANA)

2) Prepare a Delta file, comparing data previously send to HANA with new data

[Note: for a simplified version of using PIG with HANA see Using HADOOP PIG to feed HANA Deltas]

The pig scripts I created for this more complex example are available at AronMacDonald/Quake_PIG · GitHub

c) SQOOP was used to export the delta records to HANA

[Note: for an overview of using SQOOP with HANA see Exporting and Importing DATA to HANA with HADOOP SQOOP]

The sqoop export statement for this tab delimited file was:

sqoop export -D sqoop.export.records.per.statement=1 --username SYSTEM --password manager

--connect jdbc:sap://zz.zz.zz.zzz:30015/ --driver com.sap.db.jdbc.Driver --table HADOOP.QUAKES

--input-fields-terminated-by '\t' --export-dir /user/admin/quakes/newDelta

The target table in HANA is:

create column table quakes (

time timestamp,

latitude decimal(10,5),

longitude decimal(10,5),

depth decimal(7,4),

mag decimal(4,2),

magType nvarchar(10),

nst integer,

gap decimal(7,4),

dmin decimal(12,8),

rms decimal(7,4),

net nvarchar(10),

id nvarchar(30),

updated timestamp,

place nvarchar(150),

type nvarchar(50)

);

d) Execute a HANA procedure (from HADOOP) to populate geospatial location information for the new records

[Note: For a simplified example of calling Hana procedures from HADOOP see Creating a HANA Workflow using HADOOP Oozie]

Geospatial information is stored in the following table in HANA:

create column table quakes_geo (

id nvarchar(30),

location ST_POINT

);

In order to populate the locations a Hana procedure (populateQuakeGeo.hdbprocedure) was created which performs the following statement:

insert into HADOOP.QUAKES_GEO

(select Q.id, new ST_Point( Q.longitude , Q.latitude )

from HADOOP.QUAKES as Q

left outer join HADOOP.QUAKES_GEO as QG

ON Q.id = QG.id

where QG.id is null );

Finally an Oozie workflow was created for the above steps on a Hortonworks HDP 2.0 cluster.

An example of the execution log in the Hadoop User interface (HUE) is:

I then got to work building the HTML5 webpage on HANA XS.

These were the main references I used for building the D3 rotating Globe:

Rotating Orthographic

Rotate the World

Current Global Earthquakes

To serve up the quake information, which can be easily consumed by D3 [geojson], a custom server side Javascript (quakeLocation.xsjs) was created.

The basis of the geojson output was the following statement, which used the SAPUI5 controls for Date range and Quake magnitude:

select Q.id, Q.mag, Q.place, Q.time, QG.location.ST_AsGeoJSON() as "GeoJSON"

from HADOOP.QUAKES as Q

left outer join HADOOP.QUAKES_GEO as QG

ON Q.id = QG.id

where QG.id is not null

For a simplified version of Using D3 with HANA , including an example of how to create XSJS geojson, then see Serving up Apples & Pears: Spatial Data and D3

The complete HANA XS Project (including above mentioned XSJS, Prodedure and HTML5 source code) is available to download here:

HadoopQuakes.zip - Google Drive

I hope you found this example interesting and I hope it inspires you to automate your HADOOP HANA workflows with OOZIE, as well as exploring the graphical visualisation capabilities of SAPUI5 & D3.

Feeling the Earth Move with HADOOP & HANA

SAP PI for Beginners

ABAP 7.40 Quick Reference

Fiori: technical installation and configuration of one app from A - Z