Skip to Content

Most of you might not be aware of a feature introduced on SAP HANA SPS5. This new feature is called “Flexible Tables”, which means that you can define a table that will grow depending on your needs. Let’s see an example…

You define a table with ID, NAME and LAST_NAME. The table works fine, but you realize that you need to also add the PhoneNumber and Address…in a normal situation, you will need to open the definition and add those fields…but using Flexibles Tables, you will need to only add those fields as part of the INSERT query and let SAP HANA do it’s magic…

Of course, that scenario is very unlikely to happen, because for a couple of fields, it doesn’t make sense…so…where do we use this Flexible Tables? Enterprise Search, were we need to have all products on one table, and this table can have a really big amount of columns…as different products will have different characteristics…

So…for this blog…I really break my head trying to find a simple scenario that could cover Flexible Tables…what I came up with? SAP HANA, Flexible Tables, R and Twitter…

The code is very complex, so I’m not going to explain line by line how it works…but of course, I’m going to give a nice overview…

The Twitter API (I’m using version 1 even when it’s deprecated, simply because version 1.1 deals with Authentication and didn’t want to spend too much time on that…) allows us to fetch information from Twitter…so in this case I was interested in the Hashtags…the ones that starts with an “#” and are used to identify and organize certain tweets related to an event, technology or famous person. (I’m using only Tweets, not taking Retweets into account)…

Diagram_Flexible_HANA_01.png

Using R, I read the User Timeline to get the most recent 200 tweets from a particular account. This information will be send back to SAP HANA to be stored.

Diagram_Flexible_HANA_02.pngWith the 200 tweets, I extract all the Hashtags, summarize them and then save the information both in the final table and in a intermediate table (to be used for the next user). In this intermediate table, I will store a long string with all the Hashtags separated by a comma.

When the next user arrives, all 200 tweets are read, Hashtags extracted, combine with the one saved in the intermediate table, summarized (This is very important because we want to keep track of the previous Hashtags, both the ones that are common to both users and the ones that only exist on the first or second user) and the information will be saved in the final table and the information from the next user will be save (replacing the previous one) as a long string in the intermediate table.

Why I need to this? Simple…let’s say that the first user has 3 Hashtags…#SAPHANA, #R and #Python with values 3, 2 and 1. The next user will have 5 Hashtags…#SAPHANA, #SAP, #Ruby, #IPhone and #Android with the values 2, 5, 1, 4 and 3.

When we store the first user we will have:

UserName SAPHANA R Python
First_User 3 2 1

When we store the second user, we will have…

UserName SAPHANA R Python SAP Ruby IPhone Android
First_User 3 2 1 ? ? ? ?
Next_User 2 0 0 5 1 4 3

Now…you may wonder…why R has “0” for the next user and SAP has “?” for the first user? Easy…as you can see…as we added more fields (at runtime) the table grow…the R field was already there for the first user so it got a “0” for the next user, however SAP wasn’t there before, so we don’t actually know what should the value for the first user…so a “?” will be in place…

I’m sure you will have a better picture when you see the images of the table after I show the source code…

First, we need to create a table called “TWITTER_USERS”, that will hold the users that we want to work with…

Flexible_HANA_001.pngThen, we need another table were we are going to store the Hashtags and its values as a long string. This table will be called “FIRST_HASH”.

Flexible_HANA_003.png

Now, things get interesting, as we’re going to create our Flexible Table using a very simple command…this table will be called “TWITTER_HASHTAGS”.

Twitter_Hashtags.sql

CREATE COLUMN TABLE TWITTER_HASHTAGS(

USERNAME NVARCHAR(10)

) WITH SCHEMA FLEXIBILITY;

This table will look pretty regular when watching its definition…but it’s a Flexible Table…as you can see…we only defined one field…so this table can grow and grow and grow 😉

Flexible_HANA_002.pngNext, we need to create a couple of type tables to allow to interact between SAP HANA and R…

Table_Types.sql

CREATE TYPE T_COL_NAMES AS TABLE(

COL_NAMES NVARCHAR(1000)

);

CREATE TYPE T_COL_VALUES AS TABLE(

COL_VALUES NVARCHAR(1000)

);

And now…we’re ready to start with the code…one R procedure and two SQLScript procedures…

GET_HASHTAGS.sql

CREATE PROCEDURE GET_HASHTAGS(IN twittername TWITTER_USERS,IN first_hash FIRST_HASH,

                              OUT out_col_names T_COL_NAMES, OUT out_col_values T_COL_VALUES)

LANGUAGE RLANG AS

BEGIN

UserName = twittername$USERNAME

hashline = first_hash$HASH_LINE

hashvalues = first_hash$HASH_VALUES

Get_Twitter<-function(p_source,p_pattern){

  datalines = grep(p_pattern,web_page,value=TRUE)

  getexpr = function(s,g)substring(s,g,g+attr(g,’match.length’)-1)

  g_list = gregexpr(p_pattern,datalines)

  matches = mapply(getexpr,datalines,g_list)

  result = gsub(p_pattern,’\\1′,matches)

  names(result) = NULL

  return(result)

}

Get_Hashtags<-function(p_source){

  check<-!length(grep(‘\\”([^,\\”]+)\\”‘, as.character(p_source)))

  if(!check){

    mypattern = ‘\\”([^,\\”]+)\\”‘

    datalines = grep(mypattern,p_source,value=TRUE)

    getexpr = function(s,g)substring(s,g,g+attr(g,’match.length’)-1)

    g_list = gregexpr(mypattern,datalines)

    matches = mapply(getexpr,datalines,g_list)

    result = gsub(mypattern,’\\1′,matches)

    names(result) = NULL

    return(result)

  }else{

    result<-p_source

    return(result)

  }

}

url<-paste(“http://api.twitter.com/1/statuses/user_timeline.xml?count=200&screen_name=“,UserName,sep=””)

mypattern = ‘<text>([^<]*)</text>’

web_page<-readLines(url)

tweets<-Get_Twitter(web_page,mypattern)

mypattern = ‘[^\\&]#(\\.?\\w+)’

hash_list<-Get_Twitter(tweets,mypattern)

hashtags<-sapply(hash_list,Get_Hashtags)

hashtags<-as.vector(unlist(hashtags))

hashtags<-toupper(hashtags)

dt.hashtags<-data.frame(UserName,hashtags)

tab.hashtags<-table(dt.hashtags)

dt.hashtags<-as.data.frame.matrix(tab.hashtags)

hashtags_names<-names(dt.hashtags)

hashtags_names<-gsub(“^\\.”,”,hashtags_names)

if(length(hashline>=1)){

          hash_line<-gsub(“^(\\w)+\\,”,”,hashline)

          hash_line<-unlist(strsplit(hash_line, split=”,”))

          hash_values<-gsub(“^(\\’+\\w+\\’)+\\,”,”,hashvalues)

          hash_values<-as.numeric(unlist(strsplit(hash_values, split=”,”)))

          hash_frame<-data.frame(names=hash_line,values=hash_values)

          hash_frame[“values”]<-0

          Col_Names<-“”

          Col_Values<-“”

          for(i in 1:length(hashtags_names)){

                      Col_Names<-paste(Col_Names,hashtags_names[i],sep=”,”)

                      Col_Values<-paste(Col_Values,dt.hashtags[,i],sep=”,”)

          }

           Col_Names<-gsub(“^\\,|\\.”,”,Col_Names)

          Col_Values<-gsub(“^\\,|\\.”,”,Col_Values)

          Col_Names<-unlist(strsplit(Col_Names, split=”,”))

          Col_Values<-as.numeric(unlist(strsplit(Col_Values, split=”,”)))

          new_hash_frame<-data.frame(names=Col_Names,values=Col_Values)

          new_hash_frame<-rbind(hash_frame,new_hash_frame)

          new_hash_frame<-aggregate(values ~ names, FUN = “sum”, data = new_hash_frame)

          new_hash_names<-new_hash_frame$names

          new_hash_values<-new_hash_frame$values

          Col_Names<-“USERNAME”

          Col_Values<-paste(“‘”,UserName,”‘”,sep=””)

           for(i in 1:length(new_hash_names)){

                      Col_Names<-paste(Col_Names,new_hash_names[i],sep=”,”)

                      Col_Values<-paste(Col_Values,new_hash_values[i],sep=”,”)

          }

}else{

          Col_Names<-“USERNAME”

          Col_Values<-paste(“‘”,UserName,”‘”,sep=””)

          for(i in 1:length(hashtags_names)){

                      Col_Names<-paste(Col_Names,hashtags_names[i],sep=”,”)

                      Col_Values<-paste(Col_Values,dt.hashtags[,i],sep=”,”)

          }

}

col_names<-gsub(“^\\,\\.?”,”,Col_Names)

col_values<-gsub(“^\\,”,”,Col_Values)

out_col_names<-data.frame(COL_NAMES=col_names)

out_col_values<-data.frame(COL_VALUES=col_values)

END;

SAVE_HASHTAGS.sql

CREATE PROCEDURE SAVE_HASHTAGS(IN in_col_names T_COL_NAMES, IN in_col_values T_COL_VALUES)

LANGUAGE SQLSCRIPT AS

v_select VARCHAR(2000);

v_col_names_char NVARCHAR(1000);

v_col_values_char NVARCHAR(1000);

CURSOR c_cursor1 FOR

SELECT COL_NAMES FROM :in_col_names;

CURSOR c_cursor2 FOR

SELECT COL_VALUES FROM :in_col_values;

BEGIN

                    OPEN c_cursor1;

                    FETCH c_cursor1 into v_col_names_char;

                    CLOSE c_cursor1;

                    OPEN c_cursor2;

                    FETCH c_cursor2 into v_col_values_char;

                    CLOSE c_cursor2;

                    DELETE FROM FIRST_HASH;

                    INSERT INTO FIRST_HASH VALUES(:v_col_names_char,:v_col_values_char);

                    v_select := ‘INSERT INTO TWITTER_HASHTAGS (‘ || v_col_names_char || ‘) VALUES (‘ || v_col_values_char || ‘)’;

                    EXEC v_select;

END;

GET_TWITTER_USERS.sql

CREATE PROCEDURE GET_TWITTER_USERS(UserName NVARCHAR(10))

LANGUAGE SQLSCRIPT AS

BEGIN

          Twitter_Users = SELECT USERNAME FROM TWITTER_USERS WHERE USERNAME = :UserName;

          First_Hash = SELECT HASH_LINE, HASH_VALUES FROM FIRST_HASH;

          CALL GET_HASHTAGS(:Twitter_Users,:First_Hash,T_COL_NAMES,T_COL_VALUES);

          CALL SAVE_HASHTAGS(:T_COL_NAMES,:T_COL_VALUES);

END;

In order for this to work, we need to insert some values on our “TWITTER_USERS” table…

Twitter_Users.png

And then, simply call the “GET_TWITTER_USERS” procedure…

CALL_GET_TWITTER_USERS.sql

CALL GET_TWITTER_USERS(‘Blag’);

CALL GET_TWITTER_USERS(‘Schmerdy’);

CALL GET_TWITTER_USERS(‘ggread’);

When we execute the first call…that’s it with user @Blag we will have the following on the “FIRST_HASH” table…

Flexible_HANA_004.png

And this on our “TWITTER_HASHTAGS” Flexible Table…

Flexible_HANA_005.png

As you can see…our table started only with USERNAME…but as we pass in the Hashtags and its values…the table grew to able to hold them…

When we call the next user…that’s @Schmerdy we will have this on the Flexible Table…

Flexible_HANA_006.png

As you can see…in all the Hashtags that belong to @Blag but doesn’t belong to @Schmerdy we have a “0” value…so what will happen to the ones that belong to @Schmerdy but not to @Blag?

Flexible_HANA_007.png

Those field will have a “?” value, as they didn’t exits before we add them…and again…the table grew to hold all the new fields….

Now…something interesting is that @Schmerdy had more Hashtags than @Blag…so what will happen when we call the last user which is @ggread that by the way…has less Hashtags than @Schmerdy and @Blag…

Flexible_HANA_008.png

@ggread will have a value “0” in all the Hashtags that doesn’t belong to the user…but will have a value in the one that are similar to @Schmerdy…

Flexible_HANA_009.png

So…what will happen with the Hashtags that belongs to @ggread but doesn’t exist on @Schmerdy or @Blag? Easy…they will be added and some “?” value will be placed where those Hashtags didn’t exist before…

Flexible_HANA_010.png

I wish I could put the whole table…but it contains more than 50 columns…so better…I can export them to a .CSV file…and do some analysis using Visual Intelligence…

Flexible_HANA_011.png

Here, we can see how often these three user have used the Hashtags #SAPHANA, #SAP and #SAPTECHED in their last 200 tweets…

So…that’s it…a nice and simple way to demonstrate how the Flexible Tables work in SAP HANA by using my always beloved R 🙂

To report this post you need to login first.

8 Comments

You must be Logged on to comment or reply to a post.

  1. Lars Breddemann

    Nice write up and pretty forward going!

    And it’s a nice thing to have ‘flexible’ tables, which really means to have a ‘flexible schemas’.

    The core issue with that is something you stroke: finding a valid use case for this.

    Indeed, this feature is not something that fits in every case.

    A discussion on the “noSchema”-approach (just a step more general than ‘flexible schema’) that I think is very spot on can be found here: http://www.youtube.com/watch?v=8kotnF6hfd8

    I really like to see how, where and why this feature gets used in real implementations.

    Thanks Blag for this one!

    – Lars

    (0) 
    1. Alvaro Tejada Galindo Post author

      Lars!

      Yes…actually the term is Flexible Schemas…but Flexible Tables sound more understandable 😉

      And true…most of my time was spend of finding out an scenario that somehow made sense…using the Flight tables didn’t really appeal to me this time 😛

      Will watch the video…thanks for the link 🙂

      I will love to see it in real implementations! That’s for sure 🙂 I hope that SAP releases soon a more detailed explanation that the I’m giving in this blog…after all…every time I write a blog, I put myself in the position of a Developer not of an SAP Employee 😛

      Greetings,

      Blag.

      (0) 
  2. Albrecht Gass

    Alvaro,

    coming up with the example must have been hard, but it makes a good point. 🙂

    Have you done any performance analysis on how adding columns affects the runtime? I would assume the performance penalty will be minor in a column store but who knows. Also are the limits on the number of columns?

    Albrecht

    (0) 
    1. Alvaro Tejada Galindo Post author

      Albrecht:

      Yeah…it was happy, but I can’t be happier with the result 😉

      In term of performance…well…as you can see I only used 3 Twiiter users…but the whole process didn’t took more than 1 second…so I don’t think the impact would be critical at all…

      The limit…is…hold on in you chair…64K columns…pretty impressive, huh? 😉 But of course…I haven’t try it 😛

      Greetings,

      Blag.

      (0) 
  3. Sesha Ongole

    Hi Alvaro,

    Excellent blog with practical examples.

    I have another good example for this flexible schemas.

    In order to provide a markdown prices,manufacturers get a feed of thier product stock,price points and units sold on a daily basis from retailers(Sell thru info).

    They have to perform the analysis of whether to markdown the price and push volumes or take back the stock.

    The data will be heavy for FMCG products.

    With incoming data having different product charecteristics every time,Flexible schemas will be of great help in building faster analytics.

    Sesha

    (0) 
    1. Alvaro Tejada Galindo Post author

      Sesha:

      Thanks 🙂

      And yes…that would be a really nice scenario for “Flexible Tables” 🙂 If you could write a blog about it…it would be just great 😉

      Greetings,

      Blag.

      (0) 

Leave a Reply