Skip to Content

Note: This blog is not up to date as new functionalities have been introduced in late HANA revisions.


The example I am going to describe was actually created about two years ago when Text Analysis was first time introduced to HANA Platform.  However, I think it is still a good example to demonstrate how simple the text analysis feature is and programming with HANA in Java language that I’d like to share to you. I’m from the Startup Focus team, if you are a startup and interested in developing on HANA, visit here for more information.


Prerequisites


Register an Application at Twitter Developers

As we are going to use the Twitter API to extract the data from Twitter, it is required to create an application at Twitter Developer and we will need the authentication information of the application and use them to invoke the APIs later.


In case you haven’t use Twitter before, you need to create your twitter account firstly. You can register an application and create your oAuth Tokens at https://dev.twitter.com/. Logon with your twitter account, click your profile picture and click the “My applications”.

Snip20150409_2.png

Click the button “Create a new application”.

Snip20150409_3.png

Follow the form instructions to complete the registration. You need to input the application name, description, your websites and leave the call back URL as blank. Accept the developer rules and click the button “Create your Twitter application”.

Snip20150409_4.png

After that, you will be able to see the oAuth settings like below, save the values of Consumer Key, Consumer secret, Access token and Access token secret. We need to use them later in the APIs.

Snip20150409_5.png

Download Twitter API Java library – Twitter4J

Twitter4J is an unofficial open source Java library for the Twitter API. With Twitter4J, you can easily integrate your Java application with the Twitter services. The link to download it is http://twitter4j.org/en/index.html.

Extracting the downloaded zip file, go the sub folder lib and you will see the file twitter4j-core-3.0.3.jar, which is the library we need in the Java project and it must be added as the library or class path in the java runtime.

Snip20150409_6.png

There are some useful examples and you can simply check them to help yourselves getting familiar with the Twitter APIs.

Prepare the HANA jdbc library

In order to access SAP HANA from java, we will need the jdbc library, which you can find it at C:\Program Files\SAP\hdbclient\ngdbc.jar in windows and /usr/sap/hdbclient/ngdbc.jar in Linux by the default installation.

Exercise


Now it is ready to go, in the end of the blog, we will understand the source code of the project and know how to connect HANA from java, how to use the twitter services in java and the most impressive thing is how simple it is to run the text analysis in HANA, which combines the unstructured data from various sources like twitter, documents with the structured data in RDBMS.


Import the Java Project in Eclipse


To save your time, I will upload the project here later so you can import the existing java project instead of starting from scratch. Do not worry and we will explain all the components of the project in details below. Open your HANA Studio and follow the steps below:


1. In the File menu, choose Import….

2. Select the import source General > Existing Projects into Workspace and choose Next. You should have created the workspace in the XS exercise. Otherwise, you may need to have your workspace created first.

3. Select the root directory where your project files located, selects the project TwitterAnalysis and click Finish to complete the import.


The project structure looks like this:


Snip20150409_7.png

Understand the Java Project


The following table lists the major files in the project and we will explain them in details later in the exercise.

Directory

File

Description

src

TwitterConnection.java

Build the connection to twitter services

HDBConnection.java

Build the jdbc connection to HANA

Configurations.java

The public interface for the network, twitter authentication configurations, override it by your own account or settings

Tweet.java

The java bean class for the tweet objects

TweetDAO.java

The data access object

lib

ngdbc.jar

SAP HANA jdbc library

twitter4j-core-3.0.3.jar

Twitter4j library for twitter services in java

CreateTable.sql

The SQL statement to create the column table in HANA

CreateFullTextIndex.sql

The SQL statement to create the fulltext index for text analysis

Readme.txt

The file describes the steps to execute the project

Create a column table in HANA


Firstly, we need to create a table in HANA, where we want to store the tweets we fetched from the twitter services.


1. Open HANA Studio, copy the SQL statement from the CreateTable.sql and execute it in the SQL Console. You need to replace the current schema with your own schema.

Snip20150409_8.png

2. Expand the Catalog folder in HANA studio, you should find the table TWEETS in your schema and the definition of the table is like:

Snip20150409_9.png

Update the configurations


In the purpose to maintain the configurations easily, we put all the required information in a single interface and it is mandatory for you update it with your own account or settings before you can connect to either HANA or Twitter.


1. Open the file Configurations.java in your project. Basically, there are 4 category of setting you can override:

  • Network Proxy Settings: The proxy host and port, set the HAS_PROXY as false if you do not need to

    use proxy

  • HANA Connection Settings: Replace the HANA URL with your own HANA host and port, user,

    password and the schema where you created your table

  • Twitter Authentication Settings: Replace with your own authentication information from your twitter

    application as described in the prerequisites

  • Search Term: We will search the twitter based on the search term “startup” and we want to know what

    people were talking around the startups in twitter. You can always replace it with your own term if you are interested in other topics

Snip20150409_10.png


Test Connection to Twitter


Once have the twitter authentication maintained correctly in the previous step. You can open TwitterConnection.java and run it. You will see the message “Connection to Twitter Successfully!” following with your twitter user id in the console as the screenshot shows below.

Snip20150409_11.png


Test Connection to SAP HANA

Now let us open the file HDBConnection.java and run it. You will see the message “Connection to HANA Successfully!” in the console as the screenshot shows below. Check the Configurations.java if you encountering any issue. Snip20150409_12.png

The data access object TweetDAO is the single point to communicate with HANA from java, take a look how the source code looks like and you will know the SQL statement and how to use the jdbc library.

Snip20150409_13.png

Invoke Twitter API and save the tweets into HANA


Now it’s time to the do the real stuff. Open the file SearchTweets.java and run it, which will search the tweets based on the search term we specified in the Configurations.java and everything we got will saved to HANA table. You will see the messages in the console indicate the tweets have been inserted to HANA successfully like the screenshot shows:


Snip20150409_14.png

After that, you can run the data preview in HANA studio and see the contents of the table TWEETS in your schema like this:


Snip20150409_15.png


Run text analysis in HANA

Now we already have the tweets stored in the HANA table. The next step, we are going to run the text analysis to see what people are talking around the “startup” in twitter.

To run the text analysis, the only thing we need to do is creating a fulltext index for the column of the table we want to analysis and HANA will process the linguistic analysis, entity extraction, stemming for us and save the results in a generated table $TA_YOUR_INDEX_NAME at the same schema. After that, you can build views on top of the table and leverage all existing analysis tools around HANA to do the visualization even the predictive analysis.

1. Copy the SQL statement from the CreateFullTextIndex.sql and execute it in SQL console:


— Replace the Scheme with your own Schema! —
SET SCHEMA “I045664”;
DROP FULLTEXT INDEX “TWEETS_FTI”;

Create FullText Index “TWEETS_FTI” On “TWEETS”(“TEXT”) TEXT ANALYSIS ON CONFIGURATION ‘EXTRACTION_CORE’;


2. Do you believe the text analysis is already done by HANA? Yes, it is. Now you know how simple it is! You will be able to find a generated table $TA_TWEETS_FTI in your schema. The structure of the table looks like this, which is the standardized format for the results of text analysis:


Snip20150409_16.png

3. And here is the data preview of the $TA table, you will see the Tokens extracted from the tweets and the number of occurrence and entity type of each token.


Snip20150409_17.png

4. Based on this, you can use the knowledge you learned in the previous modelling exercises and use the table to build a view if you want. Here, we just go to the Analysis tab and build a tag cloud like this:


Snip20150409_18.png

To report this post you need to login first.

8 Comments

You must be Logged on to comment or reply to a post.

  1. Werner Daehn

    Thanks for the explanation, Eric, that answers one of my open questions I always wanted to know but never found the time to dive into.

    On the Twitter connectivity however, such an Adapter exists already, no need to write one.

    Here a video of the Hana Academy on how to use the Adapter to copy the data to Hana and keep pushing Tweets from Twitter in realtime.

    SAP HANA Academy – Smart Data Integration/Quality : Twitter Replication Pt 3 of 3 [SPS09] – YouTube

    And this video explains how to configure the Adapter. (There is a third video, on how to install all)

    SAP HANA Academy – Smart Data Integration/Quality : Twitter Replication Pt 2 of 3 [SPS09] – YouTube

    (0) 
    1. Eric Du Post author

      The Smart Data Integration and Smart Data Quality is different, with various adaptors you can combine different kinds of data sources and perform data transformations, the Twitter Library used in this blog is for java programmers who is interested to build something by their owns.

      (0) 
      1. Werner Daehn

        Not getting the point. If I write the SQL command

        create remote subscription s_twitter using

        (select * from v_tweets)

        target table TWEETS;

        I get 100% of what you did above. Get all tweets into the Hana table called TWEETS.

        Plus the advantage that all is administrated, monitored etc from within Hana.

        And if a user wants to build a Java program to get data into Hana, it would probably bmake sense to write an adapter using the Java Adapter SDK for Hana. Just like the twitter adapter was built.

        Anyway, just want to show the options.

        (0) 
        1. Eric Du Post author

          Yes, there are various options available. It really depends on the architecture design, leverage as many HANA native features as possible or use HANA as a fast DB only.

          (0) 
    1. Eric Du Post author

      Hello Christoforos,

      You can find the $TA table as the result of the text analysis, from where you can declare a web service for the UI layer to consume.

      Please also check the limitation of Twitter APIs like the total number of calls allowed to see if that restricts your use case.

      Regards,

      Eric Du

      (0) 

Leave a Reply