Skip to Content

Over on SAPHANA.com I posted a blog (it may not be live yet, bear with me!) about measuring Influencer Analysis using SAP River, HANA and Lumira. The other blog deals mostly with the analysis, whilst this blog is about the making of the app.

Where did the idea come from?

After SAP River was released, I came to think about potential use cases and I really wanted to build an app that’s a bit more than the standard “movie casting” app that is in the developer notes. To do this, I needed an interesting data source and I was reminded of the beta SCN API which was created by Matthias Steiner and the SCN team. The SCN API is in beta for testing and legal reasons, so I can’t reveal the means to access it. But, it is largely based on the Jive REST API.

I figured that I could use the code that I wrote a few days ago to integrate Python into River to inject data into SAP River. I thought I’d then start to use the power of the HANA platform by integrating HANA Text Analysis for Sentiment Analysis and then expose it using SAP Lumira. And then, to make it interesting, I gave myself one day to write the app and one day to create a story, document and blog about it.

The whole point of SAP River is that it’s supposed to be easy to use and fast to develop, so this should be possible, right?

Building the River app

In my process this was a bit iterative, as I poked at the SCN API to find the data that I wanted, but here’s my RDL code. It’s pretty simplistic and it describes SCN Spaces, Content and Authors. I decided to put Blogs and Documents as Content, so I could easily aggregate based on both. I defined contentType as an enumerated type, and so when I insert them later, I specify which type of content I’m inserting into HANA.

Screen Shot 2013-12-31 at 1.40.17 PM.png

What’s fantastic about RDL is that RDL then generates for you the HANA tables, views, entity relationships and the OData services. Done. Now we can get on with loading data. Here’s a sample table:

Screen Shot 2013-12-31 at 1.57.39 PM.png

Loading data into HANA

This is pretty easy and I used Python as my language of choice, and Sublime Text for editing – thanks DJ Adams and Brenton O’Callaghan for the advice there. Here’s my code to load into HANA. I’m sure there are better ways of doing this, I’m a hacker not a programmer.

Screen Shot 2013-12-31 at 1.47.57 PM.png

There are a few gotchas:

– The SAP River UTCTimestamp uses the OData format and requires dates in “milliseconds since the epoch” which is very frustrating. That’s the reason for the weird time conversion code. Blame Microsoft for this!

– You have to re-encode the SCN Content and other UTF-8 data in JSON, or it will fail, hence the json.dumps

– I do some funny work to turn the blog URL into an ID for later use

– These aren’t my real hostname, username or password 🙂

– I found for complex views (e.g. give me all the spaces I haven’t downloaded yet from SCN), it can be necessary to create HANA views and manual xsodata services. Not a big deal.

Enabling Text Search and Sentiment Analysis

That’s the best part – and this couldn’t be easier. It’s one command! Note that this uses the Voice of Customer configuration, which includes sentiment analysis as well as text extraction. You can define your own dictionaries if you want to, but I didn’t do this.

Screen Shot 2013-12-31 at 1.56.51 PM.png

Now, this actually creates a new database table called $TA_VOICE. It contains 1m text terms for my 40k pieces of content and it looks like this:

Screen Shot 2013-12-31 at 2.02.03 PM.png

Yes, I filtered on “unambiguous profanity” 🙂

When the underlying table is updated, the text index is updated with it.

Building the HANA Model

Note that I can also build the HANA model inside the SAP HANA Developer Perspective, right inside my RDL project. It’s advantageous to do this because I can keep all my developer artifacts in one place, and transport them together between systems.

Screen Shot 2013-12-31 at 2.07.28 PM.png

I did this the regular HANA way – an Attribute View to join the Time Dimension, and then an Analytic view for my Content. This allows me to quickly aggregate and view data based on date, author, content and space. It takes 100ms to materialize the whole 40k row table.

Screen Shot 2013-12-31 at 2.03.44 PM.png

Now because my Voice of Customertable is also a fact table, I need to create a Calculation View so I can have a single Information Model. I do it like this:

Screen Shot 2013-12-31 at 2.06.01 PM.png

I now have one Information Model that can tell us any question about SCN data that we choose to ask. Unfortunately for either API or privacy reasons there are a bunch of things that I’ve not been able to extract, like Company, Country information or Badges, as well as ratings. It’s a shame but such is life.

Connecting to HANA with Lumira

Now we can connect right on into HANA with Lumira.

Screen Shot 2013-12-31 at 2.10.53 PM.png

Our Influencer Dataset is immediately available and we can see our attributes and measures:

Screen Shot 2013-12-31 at 2.11.18 PM.png

And here’s a sample graphic – Top 20 Blog/Document writers over all of SCN for 2013, also ranked by number of likes and replies. Congratulations Tammy Powlas!

Screen Shot 2013-12-31 at 2.21.06 PM.png

Conclusions

I hope this makes interesting reading, it was certainly very interesting to build this. You can head over to SAPHANA.com if you want to see a more detailed influencer analysis – this is the “building of” blog. It’s worth noting that I started this at 9am on Monday, and it’s now 2.30pm on Tuesday and the River app is built, data is troubleshooted and loaded (data is always the hardest thing), text analysis is complete and HANA models is designed. SAP Lumira analysis has been completed and two blogs have been written describing the process.

This is what I hoped to achieve and this is the point of SAP River!

In 2014, the SAP HANA Application Platform is clearly really going to come of age, and the ability to quickly build transactional apps using SAP River and push Big Data into SAP HANA is a very powerful concept. In addition, the ability to then add text search and analysis, spatial, predictive and graph capabilities to these is very exciting.

A quick thanks to Matthias Steiner and his SCN API, everyone who engaged with me on Twitter last night who gave me ideas to make this blog better, the SAP HANA, River and Lumira teams, all of whom are working with me right now to make the products even better.

Have a very Happy New Year, and I look forward to working with you all in 2014.

To report this post you need to login first.

14 Comments

You must be Logged on to comment or reply to a post.

  1. Jody Hesch

    Awesome blog, John. Fantastic to see what’s possible with SAP River (and the rest of the HANA landscape)! One of these days I’ll have to “jump in” 🙂

    (0) 
  2. Tom Van Doorslaer

    Great initiative.

    now expand it into a complete dashboard for behaviour analysis 🙂

    Seriously though: really cool combination of technologies:

    – scnappy

    – river

    – hana

    – lumira

    worthy of a demojam?

    (0) 
  3. Matthias Steiner

    Now THAT is interesting! While I have to say that some of your results need to be taken with a grain of salt (at least) I do love the fact that you gave #scnappy a test-drive and that you shared your experience with the rest of us! KUDOS!

    It’s definitely an interesting mix of technologies you used here and the fact that you were able to achieve what you wanted within just a few hours of hacking speaks for itself irt developer-productivity of SAP’s new technologies. 😉

    Regarding the SCN API and when it’ll see the light … wish I could give you an exact date, but that’s up to higher powers… I’m positive though!

    (0) 
    1. John Appleby Post author

      Thanks. I was on another River call today with a partner that I’m co-innovating with, and they are used to regular DBMS development. His comment was “I’m very impressed – this would have taken days with a regular DBMS”.

      What’s interesting is that most tech which allows you to prototype rapidly isn’t suitable for enterprise apps – take Visual Basic, and Perl, as two good examples.

      And hopefully this helped you in your mission to get scnappy productionized in SAP – I noticed that before it was a solution looking for a problem.

      (0) 
      1. Matthias Steiner

        And hopefully this helped you in your mission to get scnappy productionized in SAP – I noticed that before it was a solution looking for a problem.

        That’s a misconception John! There are already several parties (internal and external) using the API for various projects, yet most of the communication happens behind the scenes. Your project was just one of many, and of course every input is useful to help making the API production-ready. Thanks for that!

        (0) 
  4. Irshaad Bijan Adatia

    Going to be trying to do this myself with a different data set.

    I like the Easter egg of knowing that an API is in the works for the SCN 😉

    River was only recently stumbled upon via someones tweet, and I’ve been seeing how it can integrate with HANA and eventually some form or fashion Lumira.

    Will update you when I’ve got everything down to beautiful art.

    Thanks again John~

    (0) 
  5. James Rapp

    This is epic stuff … it would be a nice addition if River allowed you to create the information models in your source code to make this accessible to the SAP BI clients immediately!

    (0) 
    1. John Appleby Post author

      Glad you enjoyed, thanks for the Twitter love.

      Yes, and the great thing about HANA is that you get bi-yearly releases. Jacob Klein and team are working hard on the next release. Hopefully it will bring this and some other goodies!

      (0) 

Leave a Reply