Text Analysis in SAP HANA integrate with Twitter A...

ericdu · ‎04-09-2015

Note: This blog is not up to date as new functionalities have been introduced in late HANA revisions.

The example I am going to describe was actually created about two years ago when Text Analysis was first time introduced to HANA Platform. However, I think it is still a good example to demonstrate how simple the text analysis feature is and programming with HANA in Java language that I'd like to share to you. I'm from the Startup Focus team, if you are a startup and interested in developing on HANA, visit here for more information.

Prerequisites

Register an Application at Twitter Developers

As we are going to use the Twitter API to extract the data from Twitter, it is required to create an application at Twitter Developer and we will need the authentication information of the application and use them to invoke the APIs later.

In case you haven’t use Twitter before, you need to create your twitter account firstly. You can register an application and create your oAuth Tokens at https://dev.twitter.com/. Logon with your twitter account, click your profile picture and click the “My applications”.

Click the button “Create a new application”.

Follow the form instructions to complete the registration. You need to input the application name, description, your websites and leave the call back URL as blank. Accept the developer rules and click the button “Create your Twitter application”.

After that, you will be able to see the oAuth settings like below, save the values of Consumer Key, Consumer secret, Access token and Access token secret. We need to use them later in the APIs.

Download Twitter API Java library – Twitter4J

Twitter4J is an unofficial open source Java library for the Twitter API. With Twitter4J, you can easily integrate your Java application with the Twitter services. The link to download it is http://twitter4j.org/en/index.html.

Extracting the downloaded zip file, go the sub folder lib and you will see the file twitter4j-core-3.0.3.jar, which is the library we need in the Java project and it must be added as the library or class path in the java runtime.

There are some useful examples and you can simply check them to help yourselves getting familiar with the Twitter APIs.

Prepare the HANA jdbc library

In order to access SAP HANA from java, we will need the jdbc library, which you can find it at C:\Program Files\SAP\hdbclient\ngdbc.jar in windows and /usr/sap/hdbclient/ngdbc.jar in Linux by the default installation.

Exercise

Now it is ready to go, in the end of the blog, we will understand the source code of the project and know how to connect HANA from java, how to use the twitter services in java and the most impressive thing is how simple it is to run the text analysis in HANA, which combines the unstructured data from various sources like twitter, documents with the structured data in RDBMS.

Import the Java Project in Eclipse

To save your time, I will upload the project here later so you can import the existing java project instead of starting from scratch. Do not worry and we will explain all the components of the project in details below. Open your HANA Studio and follow the steps below:

1. In the File menu, choose Import....

2. Select the import source General > Existing Projects into Workspace and choose Next. You should have created the workspace in the XS exercise. Otherwise, you may need to have your workspace created first.

3. Select the root directory where your project files located, selects the project TwitterAnalysis and click Finish to complete the import.

The project structure looks like this:

Understand the Java Project

The following table lists the major files in the project and we will explain them in details later in the exercise.

Text Analysis in SAP HANA integrate with Twitter API

Get Your SAP HANA Idea Incubator Badge Today!

SCN Mission - SAP HANA Quiz Challenge is now retired

Share your #HANAStory and Win