Skip to Content

Recently I wrote a blog how to use import.io to extract data from any web page and put it into Lumira. This in itself opens up a ton of possibilities, just think about all the information you come across in day to day life, but have no way of analyzing due to it being closed up in a web site. Well, not anymore, import.io can help you extract any data you like, using a user friendly interface.

But sometimes, loading data in Lumira is not enough. Let’s say you want to load that extracted data into HANA, in a single step, for further processing. In that case you would need an extractor connecting directly to HANA. Well my dear friends, that’s exactly what I build.


From a flat file to Lumira to a fully automated extractor

I used the previous example I wrote about in my blog so that you can easily follow what I did before, but trimmed the results a bit to have an easy to follow example.

I edited my extractor and removed all unnecessary columns. The end result looks like this, three simple columns:

New Picture.png

Please note that the definition of the columns I used is “Text” in order not to get extra meta data in my service, which I don’t need. Import.io is smart and will give extra information like currency and other source data, but for my example I don’t need this. Defining columns as text makes sure that doesn’t happen.

 
Creating a basic extraction script

First thing to do is open up the import.io browser again and press the integrate button in the “My Data” page:

New Picture (1).png

This will bring you to a page with integration options for import.io. For my extractor I will use my trusty companion: Python.

Now the cool thing is that import.io actually already creates a python script based on the data sources you just created! Not a single line of extra code is required to get the data already in JSON format.

Be sure to follow the steps as mentioned on the page:

New Picture (2).png

These client libraries are required to be able to execute the Python script. Next to this you also need the “requests” library to push the records into HANA so download that and install it aswell following the instructions here:

http://docs.python-requests.org/en/latest/user/install/

Next step is to download the example script (be sure to enter your password at step 2 to automatically have you api key filled!);


New Picture (6).png


Modifications needed to have the script post the records to HANA

First off, we need a table in HANA. Create one in the SAP HANA Development view in HANA Studio

File name “wsop2.hdbtable”:

table.schemaName = “PHILIPS”;

table.tableType = COLUMNSTORE;

table.columns = [

{name = “name”; sqlType = NVARCHAR; length = 100;},

{name = “bracelets”; sqlType = INTEGER;},

{name = “rings”; sqlType = INTEGER;}

];

table.primaryKey.pkcolumns = [“name”];

Ofcourse use you own schema!

Define your service:

File name “wsop2.xsodata”

service namespace “wsop2” {

“PHILIPS”.”wsop::wsop2″ as “WSOP2”;

}

These steps make sure that you have a table and a service to post your records to.

Now open your Python script you downloaded in the previous step and add the following lines of codes at the bottom:


# Now lets push to HANA

print “Pushing to HANA”

url = ‘Your XS Service URL’

headers = {“Content-type”: ‘application/json;charset=utf-8’}

auth = YOUR_USER, ‘YOUR_PW’

for row in dataRows:

  r = requests.post(url, data=json.dumps(row), headers=headers, auth=auth)

  if r.status_code == 201:

   print “Record successfully created in HANA”

  else:

   print “We seem to have a duplicate record in HANA!”

Ofcourse enter your own URL you defined when defining the service and enter the user id and password to your HANA system!


That’s all folks!

Really, that’s all folks! With just a couple of simple steps you can extract any data you find on the web and push it into HANA. Don’t know about you, but I am really excited about the possibilities this brings. Gather a ton of information from the web and analyse it to no end in HANA!

To clarify the end result, a small clip:

Thank you for reading this and take care,

Ronald.

p.s. in case you want to take a look at the result of my total Python script, look here and thank you P. R. BI-Formance BV for lending me your awesome “BW on HANA” system to tinker with!

To report this post you need to login first.

Be the first to leave a comment

You must be Logged on to comment or reply to a post.

Leave a Reply