Skip to Content

DISCLAIMER: I’m not an SAP HANA expert or an R expert, not even a Python expert. I’m just a guy with a lot of ideas who loves to write blogs.

The other day I was thinking about making some nice with SAP HANA and R, because people doesn’t seem to be enough interested in R yet, and that’s a real shame…R is just awesome and SAP HANA is more awesome…so bringing them together is…I think you got the idea…

First thing that came into my mind was a survey…first name, last name, country, age, *** and a favourite programming language…of course…I need a lot of records…more that I could generate by hand…so…a Python script came to the rescue…I thought…why I don’t take my team’s names and last names, countries of origin, made up some ages and other things just to fill up the mix. The script will basically generate random entries that will get loaded into SAP HANA.

Vote_Generator.py

import os

import sys

import random

names = [“Anne”, “Gigi”, “Juergen”, “Ingo”, “Inga”, “Alvaro”, “Mario”, “Julien”,

“Mike”, “Michael”, “Karin”, “Rui”, “John”, “Rocky”, “Sebastian”, “Kai-Yin”,

“Hester”, “Katrin”, “Uwe”, “Vitaliy”]

last_names = [“Hardy”, “Read”, “Schmerder”, “Sauerzapf”, “Bereza”, “Tejada”,

“Herger”, “Vayssiere”, “Flynn”, “Byczkowski”, “Schattka”,

“Nogueira”, “Mayerhofer”, “Ongkowidjojo”, “Wieczorek”, “Gau”, “Hilbrecht”,

“Staehr”, “Kylau”, “Rudnytskiy”]

age = [24, 34, 40, 38, 28, 36, 35, 42, 30, 37]

*** = [“M”, “F”]

country = [“Germany”, “France”, “Polland”, “Peru”, “Russia”, “USA”, “China”,

“Philippines”, “Portugal”, “Spain”]

vote = [“ABAP”, “Node”, “Ruby”, “Python”, “R”, “PHP”, “ActionScript”,

“Euphoria”, “Java”, “C++”]

def Generate_File(pSchema, pNumber):

    iNumber = 0

    pathname = os.path.dirname(sys.argv[0])

    pathname = os.path.abspath(pathname) + “\HANA_File.txt”

    myfile = open(pathname, “a”)

    while iNumber < pNumber:

        r_names = random.randrange(0, 20)

        r_lastnames = random.randrange(0, 20)

        r_age = random.randrange(0, 10)

        r_*** = random.randrange(0, 2)

        r_country = random.randrange(0, 10)

        r_vote = random.randrange(0, 10)

        iNumber += 1

        myfile.write(“insert into ” + pSchema + ” values(‘” +

        names[r_names] + “‘,'” + last_names[r_lastnames] + “‘,” +

        str(age[r_age]) + “,'” + ***[r_***] + “‘,'” + country[r_country] +

        “‘,'” + vote[r_vote] + “‘);\n”)

    myfile.close()

    print ‘The file ‘ + pathname + ‘ was written successfully’

schema = raw_input(“Schema: \n”)

num_files = input(“How many records?: \n”)

Generate_File(schema, num_files)

The execution is fair simple…we pass the schema and the number of files we want to generate…for this blog…200,000 records sounded like a good deal.

HANA_R_Votes_00.png

As you can see…some random funny names are generated, but after all, we’re a team so I don’t think they care too much.

Next step was to create the table in SAP HANA and load the records.

HANA_R_Votes_01.png

The main goal of our RHANA code is to have a detailed information on how many women or men of certain age from a certain country voted on the survey. Something like CHINA –> 24 –> F –> 890. (890 women of 24 years old from China voted on the survey). So of course, we need to create a table to handle that information as well.

HANA_R_Votes_02.png

With those pieces ready…we can code our script…

HANA_R_Votes.sql

DROP TYPE T_VOTES_DETAIL;

CREATE TYPE T_VOTES_DETAIL AS TABLE (

COUNTRY VARCHAR(15),

AGE CHAR(2),

*** CHAR(1),

FREQUENCY INTEGER

);

DROP PROCEDURE CALCULATE_VOTES;

DROP PROCEDURE GET_VOTES;

CREATE PROCEDURE CALCULATE_VOTES(IN votes VOTES, out votes_detail T_VOTES_DETAIL)

LANGUAGE RLANG AS

BEGIN

Country<-votes$COUNTRY

Age<-votes$AGE

***<-votes$***

Votes<-table(Country, Age, ***)

Stats<-data.frame(Votes)

Stats<-subset(Stats,(Stats$Freq>0))

Stats<-Stats[order(Stats$Country,Stats$Age),]

votes_detail<-data.frame(COUNTRY=Stats$Country,AGE=Stats$Age,***=Stats$***,FREQUENCY=Stats$Freq)

END;

CREATE PROCEDURE GET_VOTES()

LANGUAGE SQLSCRIPT AS

BEGIN

tab_votes = SELECT * FROM VOTES;

CALL CALCULATE_VOTES(:tab_votes, T_VOTES_DETAIL);

INSERT INTO “VOTES_DETAIL” SELECT * FROM :T_VOTES_DETAIL;

END;

CALL GET_VOTES();

SELECT * FROM VOTES_DETAIL

When everything is ready, we call the GET_VOTES() procedure which is going to read the 200,000K records from SAP HANA, send them to our R server to process the data and then send it back to SAP HANA to fill up a table with the information we want to take care of.

HANA_R_Votes_03.png

Yes…the whole process took less that 2 seconds…for 200,000K records…is not bad at all…

We can now select the information from our condensed table…

HANA_R_Votes_04.png

Now that we have the information we want, we can use SAP HANA analysis capabilities to…analyse our data…

HANA_R_Votes_05.png

HANA_R_Votes_06.png

I gotta say…that was pretty easy and fast…might not be the best scenario ever, but I’m sure it will you a better perspective of what can be done with SAP HANA and R.

To report this post you need to login first.

10 Comments

You must be Logged on to comment or reply to a post.

  1. Nathan Oyler

    Very cool! I really enjoy your blog posts, you put a lot of time and effort into the technical details, and spell out everything very well so it can be duplicated and expanded upon.

    Thank you, and I hope you continue to produce more!

    (0) 
    1. Lars Breddemann

      There is no “RHANA” package.

      You can actually hook up a SAP HANA instance with an RSERVE installation and develop and run actual R-code within SAP HANA, just as you do with SQLScript code.

      That’s all to be found in the documentation… 😉

      Blag was just generally referring to R-code in SAP HANA as “RHANA” code.

      – Lars

      (0) 
        1. Lars Breddemann

          I had to do some research on this… and you’re right.

          There is a package RHANA but it has never been released to customers outside SAP.

          If R-integration is an interesting option for your scenario, I’d recommend to start off with the released way of using it http://help.sap.com/hana/SAP_HANA_R_Integration_Guide_en.pdf.

          In case this is not sufficient involving the development team around the RHANA package might be the best next step.

          RHANA afaik is really “just” another way to facilitate the data transfer between SAP HANA and RSERVE. Function-wise it provides the same options as the standard integration.

          – Lars

          (0) 
              1. Henk Binnendijk

                completely true! i am looking for more efficient way to use R in combi with HANA. especially the performance is in focus, therefor the request for the RHANA package. But a better architecture will defently help.

                (0) 

Leave a Reply