Part 2: Refine processing of tweets with text Analysis and Fuzzy Search, clustering and building web services and setting HANA Cloud Connector
Here I am with the second part of the blog. As we look at the first part of the blog we have done some basic installations, created a Tweets table, set up the tunnel to HANA Database and crawled the tweets. In this part of the blog we will perform text analysis, use fuzzy search, clustering (from Predictive Analysis Library / PAL), build web services using HANA native development, setup authorizations, SAP HANA Cloud Connector and Configurations.
Besides the possibility to persist data in both row and column stores, SAP HANA also provides numerous libraries to query or manipulate data such as predictive analysis, text mining, geospatial and graph functionality and so on. I will be using geospatial in part 3.
Performing Text Analysis
Mapping Business Needs to Text Analysis
SAP HANA Text Analysis has market-leading, out-of-the-box predefined entity types that are packaged as part of the platform. Looking at a sentence, clause, paragraph, or document, the technology can identify the “who”, “what”, “where”, “when” and “how much” and classify it accordingly.
To know more about Text analysis please click here.
Let’s focus on performing some sentiment analysis using HANA. To create a text index on tweets execute the following code in the SQL editor in eclipse:
— CREATE TEXT ANALYSIS INDEX ON TWEETS
CREATE FULLTEXT INDEX “tweets” ON “Tweets”(“text”)
LANGUAGE COLUMN “lang”
LANGUAGE DETECTION (‘EN’,’FR’,’DE’,’ES’,’ZH’)
TEXT ANALYSIS ON
Now $TA_tweets index is created under your schema (NEO_xxxxx) in tables folder, right click on $TA_tweets index and open data preview to see some important columns like TA_RULE, TA_TOKEN, TA_TYPE. In this index we got the ability to identify the strong positive, weak positive, strong negative and weak negative sentiments.
Similarly let us create a tweeters view with influence and stance scores by executing the SQL script “setupView.sql” file in the SQL window in eclipse from downloaded SmartApp->scripts->setupView.sql folder (in part 1). Refresh on the views list of schema to see the “tweeters” view. Right click on it, then “open data preview” to see the influence and stance of people.
To get more insights on Predictive Analysis Library (PAL) please see the documentation.
It is a type of search that will find matches even when users misspell words or enter in only partial words for the search.
Let us see how it works – execute the following sql statements in the sql editor in eclipse by replacing NEO_XXX with your schema and in the place of ‘SAP’ give your product or company name that you wish to search.
SELECT SCORE() AS score, * FROM NEO_XXXXX.“$TA_tweets”
CONTAINS(TA_TOKEN, ‘SAP’, FUZZY(0.5))
ORDER BY score DESC;
The output of fuzzy search contains many related entries. Based on the fuzzy search factor (which is 0.5 in this case), it will also consider the similar words. In this case “SAPNL”, “sapnl” “SAPAG”etc.
To perform clustering go to the scripts folder in the downloaded SmartApp, then copy the predictive.sql to sql window in eclipse.
We need to do some changes in the code i.e. replace “NEO_” with your schema name “NEO_XXXXXX”. Execute “create stored procedure” by selecting the code snippet in the sql window to create the tweeters cluster procedure. Refresh the catalog now and you will find another new schema which is _SYS_AFL -> Procedures -> NEO_XXXX_TWEETERS_CLUSTER.
Then, first create parameters tables, output table and then views for ODATA by selecting the code snippet in the sql console respectively.
Similarly to do clusteringwe have to run the consecutive “runtime” code.
Execute the following statements individually to see the results like user, clusterNumber, distance, stance and influence etc.
SELECT * FROM “Tweets”;
SELECT * FROM “$TA_tweets”;
SELECT * FROM PAL_RESULTS;
SELECT * FROM PAL_CENTERS;
SELECT * FROM “TweetersClustered”;
SELECT * FROM “Clusters”;
Click on Views under your schema (NEO_xxxx), then select and right click on Tweeters clustered view and open data preview and have a look on analysis as shown below:
TweetersClustered view with the Raw data
TweetersClustered view with the Analysis
Build web services
Let us navigate to HCP (HANA Cloud Platform) cockpit, initially we created a HANA instance called as “dev”. Click on the Development tools link in HANA XS Applications, it will authenticate you and then it opens the HANA Web-based Development Workbench. Here we can develop our projects for a web service that will appear on the web.
To know more about HANA web based development work bench please click here .
In HANA web based development work bench let us create an application by right clicking on dev (content->pxxxxtrial->dev) and in the popup select template as ‘Empty application’ and in package enter “pxxxxtrial.dev.SmartApp”, then click on create. Now we can see the SmartApp project with xsaccess, xsapp, index.html files.
Setup authorizations for the application
Delete the index.html file that was created by default during application creation. Open the downloaded SmartApp->services-> user.hdbrole, xsprivileges files and globally replace on neo_ and pxxx with your schema name (NEO_xxxx) and pnumber (pxxxx) then save. Drag and drop the user.hdbrole and xsprivileges files to the Multi-file Drop zone in development workbench. We will get an error message because we got other stuff to do before the application can be finished. Refresh the project in order to see the files in the development workbench and click on save.
Similarly open the xsaccess file from downloaded SmartApp->services and globally replace on ‘neo_’ and ‘pxxx’ with your schema name(NEO_xxxx) and pnumber(pxxxx) respectively. Remove the comment line in the file, then drag and drop to the Multi-file Drop zone in development workbench and refresh the application, then save it.
We need to execute the following statement in SQL editor in eclipse in order to authorize the user by replacing pxxx with your p-number.
Setup access to tables and views via the OData protocol: to do that open services.xsodata file from downloaded SmartApp->services and global replace on ‘NE0_’ with your schemaname (NEO_xxxx) then save it drag and drop the services.xsodata file to the Multi-file Drop zone in the development workbench and refresh the application. Then, activate the OData file.
Open services.xsjs file from downloaded SmartApp->services. Globally replace on ‘neo_’ and ‘pxxx’ with your schemaname (NEO_xxxx) and p-number (pxxxx) respectively. Drag and drop the services.xsjs file to the Multi-file Drop zone in development workbench and refresh the application. Then save it.
Note: If you want to know about debugging the application please click here .
Authentication setup proxy
The Destination that does the basic authentication, hence the proxy will handle authentications and access to the Hana trial instance. Then sends the results back to the UI.
Note: This proxy will be used only for the trial developer version but is not necessary to run in production environment since the backend system can be accessed directly.
Now find the tool for the proxy that is hanatrial-auth-proxy (written in node.js). Firstly, let me thank Mr. Gregor Wolf for this amazing tool for HCP trial version. Please click here for the tool. Shift+Right click on the downloaded folder, open in the command window. Install the node.js modules using command “npm install”.
In the hanatrial-auth-proxy ->examples->config.js we need to replace the host where our application runs (For example, s11hanaxs.hanatrial.ondemand.com). Save it and go back to the command prompt, now use the command “node examples/server-basic-auth.js“, then we will find HANA Cloud Trial Authentication Proxy for HANA XS Services ready.
Open a new window in your browser (I am using chrome), type ‘localhost:7891/pxxxxtrial/dev/SmartApp/services.xsodata/$metadata’ hit enter and you will be prompted to login, use your pnumber (pxxxx) as username and put in the password that you use to login to the HCP cockpit, then click on enter. You will be authenticated to your SAP HANA instance using several tools, and now we get the results back. So, effectively we are able to access using the local webserver i.e proxy.
Setup SAP HANA Cloud Connector
Install Hana cloud connector https://tools.hana.ondemand.com/#cloud either developer or productive version. If you want to know more about the installation process go to the documentation. On windows, double click on the downloaded msi file, it may ask for Microsoft Visual Studio to install – click yes and download it and run from Microsoft. Then install the msi file while installing let the port be default (8443).
After installation to start and stop the server go to the start menu in windows, you will notice that some new links have appeared for the cloud connector. Open the browser, go to https://localhost:8443. If the page below is showing up the reason is that the certificate hasn’t actually been signed yet, so just click on ‘Proceed to localhost’.
For login we have a default username as “Administrator” and password as “manage” that will get you signed in for the first time. Later you need to change the password. Choose Master (Primary installation) and next change the password.
Now set up the initial configuration as follows
If you are working behind a firewall then you need to provide Proxy settings otherwise this is not necessary. Click on apply, then it is directed to the administration console where we can see that the HANA cloud connector is connected but no active resources are available. So, we need to set resources.
Go to access control and click on Add..(Add System Mapping) and insert the details as shown below:
Select the protocol for communication as HTTP:
Give internal host and internal port as shown below:
Give the virtual name as hanaxs.virtual and virtual port as 7891.
Make sure the check-box is marked to check availability of internal host and hit on finish.
Now we can see that a mapping has been created for us: just click on it.
Add a resource by clicking on “Add…” in Resources Accessible on your local host. Give your pnumbertrial (pxxxtrial) as url path and Access Policy as ‘Path and all sub-paths’. Save it and notice that the url path must be checked and its state is green.
We have a node application that is running on localhost on port 8888, to make this available from the web and HTML5 app we need to configure it in the HANA cloud connector.
Go to “Access control” on the left side of the page and click on ‘Add..’(Add System Mapping) and insert the details as shown below:
Select Back-end Type as Non-SAP System:
Select protocol for communication as HTTP:
Add internalhost and internal port as shown below:
Add Virtual Host and Virtual Port as shown below:
Make sure the check-box is marked to check availability of internal host and hit on finish.
Mapping is created, just click on it and Add Resource by adding “do” as url path (because it’s the high level resource for this application) and Access Policy as ‘Path and all sub-paths’, then Save it. Now we have in total two mappings.
Then we need to define the Destinations that will talk and reference to the mappings in the cloud connector. So go to the downloaded SmartApp folder-> destinations->hanaxs, then replace the user with your username (pxxxx) and save it.
Go to the SAP HANA Cloud Platform Cockpit, select destinations and click on Import from file then select downloaded SmartApp folder-> destinations->hanaxs and we can see the settings have been taken from the file which seems to be as follows
Replace the user with your pnumber(pxxxx), enter the password and save. Now we can see the destination has been configured.
Similarly for nodejs application select import from file then select SmartApp folder-> destinations->nodejs and we can see the settings have be taken from the file which seems to be as follows
Just save it, no changes is required. So, now in total two destinations are defined.
Thanks for reading. Hope you enjoyed the blog 🙂
In the next part of blog we will move ahead to build some attractive web-based UI (User Interface) using SAP UI5 and will perform some customizations using analytic tools.