Skip to Content
Author's profile photo Tom Flanagan

[SAP HANA Academy] Live3: Preform Clustering

[Update: April 5th, 2016 –  The Live3 on HCP tutorial series was created using the SAP HANA Cloud Platform free developer trial landscape in January 2015. The HCP landscape has significantly evolved over the past year. Therefore one may encounter many issues while following along with the series using the most recent version of the free developer trail edition of HCP.]


The SAP HANA Academy’s Philip Mugglestone continues along in Live3 course by reviewing and running the SQL Script necessary to setup and execute K-means clustering analysis using the SAP HANA predictive analysis library. Philip also shows how to create views that will be accessed via web services down the road. Please watch Philip’s video below.

Screen Shot 2015-04-16 at 1.04.43 PM.png

(0:30 – 2:00) Pasting Predictive SQL code and Replacing Schema Name

With Notepad++ open file 05 setupPredictive.sql from the scripts folder of the Live3 code repository on GitHub. Copy and paste the code into a SQL console in Eclipse.

First make sure you’re set to the proper schema. Second you must paste your schema name everywhere in the code where it currently has neo_ as a placeholder. There are five lines of code that must be corrected and you do have the option of preforming a global replace.

(2:00 – 4:35) Examination of the Setup Predictive Code – HCP AFL Wrapper Generator Stored Procedure

The first part of the code will preform a clean up in case the syntax has been previously run.

The next section creates a stored procedure using a special HCP stored procedure called HCP AFL Wrapper Generator. As HCP is a multi-tenant version of SAP HANA, this stored procedure is slightly different than the on-premise SAP HANA AFL Wrapper Generator. This stored procedure creates a set of input table types that reflects the input data table, the parameter table and the two results tables.

Screen Shot 2015-04-16 at 1.15.25 PM.png

A reference to these table types is put into a signature table. Then the Wrapper Generator is called with the name of the procedure we want it to create (Tweeters_Cluster), the name of the algorithm we want to use (KMEANS) and the four input/output tables we want to use from the signature table. This allows SAP HANA to generate a stored procedure to preform the clustering that we can then call later on.

After executing the stored procedure, Philip examines it in the procedures folder of the newly created SYS_AFL schema.

(4:35 – 6:15) Examination of the Setup Predictive Code – Creating Parameter and Output Tables

The next part of the code creates the parameter table that will be used. The table’s type has already been defined in the stored procedure so a column table can be created using the like statement. Then values are inserted into the table. Philip recommends optimizing the number of clusters that will be created by setting a minimum value of 5 and a maximum value of 10. Execute that section of code to create the parameter table.

Screen Shot 2015-04-16 at 1.56.00 PM.png

Continuing on in the code is two lines that create a pair of empty output tables called PAL_RESULTS and PAL_Centers. These output tables are created again with the like statement so they use the table types that have already been set up. The table types respect the structure that was listed in the documentation.

(6:15 – 8: 35) Running the Code and Examining the Tables


To actually run the clustering is rather simple. First the results tables must be empty and then the stored procedure must be called. This is the section that is run on a regular basis, possibly every minute or hour. The input table/view is the Twitter view, which will send real-time data straight into the stored procedure in the SAP HANA engine. The stored procedure does an intense job of reading the entire set of data and iterating it up to 100 times to work out the clusters.

At the end of the code is a set of select statements that let us see the data in the output tables. The first result table shows a cluster number between 0 and 9 for each of the Twitter users from the input data. The distance number in the results table details how far a user is away from the center of their assigned cluster. The second output table shows the stance and influence value for each cluster.

Screen Shot 2015-04-16 at 2.17.32 PM.png

(8:35 – 10:30) Examination of the Setup Predictive Code – Creating Views for Web Services

The code also creates a pair of views. The TweetersClustered view pulls in the original tweeting information, adds 1 to the cluster number and counts the total number of tweets each user sent. This enables us to see the stance, influence, the total number of tweets and the cluster number for each Twitter user.

Screen Shot 2015-04-16 at 2.16.40 PM.png

The Clusters view shows the center of the clusters. This view adds one to the cluster’s value and shows the number of people in the cluster using a join of a select of the number of users assigned to each cluster.

(10:30 – 12:00) Analysis of the TweetersClustered View

Open a data preview for the TweetsClustered view in the HCP schema and go to the Analysis tab. Drag the user and the clusterNumber over on the Labels axis and the stance and influence over on the Values axis. Change to a bubble chart to see a plot of the influence and stance for each individual Twitter user with a color for each individual cluster number.

Screen Shot 2015-04-16 at 2.14.34 PM.png

Follow along with the Live3 on HCP course here.


SAP HANA Academy over 900 free tutorial videos on using SAP HANA and SAP HANA Cloud Platform.


Follow @saphanaacademy

Assigned Tags

      8 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Former Member
      Former Member

      Hi,

      I am receiving following error message when I execute sql statement.

      /wp-content/uploads/2016/03/aflprocfailed_916345.png

      Can someone or Philip Mugglestone suggest a resolution for it?

      Regards

      Angad

      Author's profile photo Philip MUGGLESTONE
      Philip MUGGLESTONE

      Hi Angad,

      The live3 tutorials were created using the HCP free developer trial landscape as of early 2015. However the landscape has evolved significantly since then. So there could be all sorts of reasons why you see this issue.

      Which HANA database system are you using (HANA <shared>, HANA XS <shared>, or HANA MDC <trial>) and which version (for example, 1.00.102.03)?

      I just tried the same code against my NEO_xxx schema with HANA XS <shared> for 1.00.102 and it worked fine.

      Are you sure to have created all the underlying tables and views in your NEO_xxx schema?

      Thanks,

      Philip

      Author's profile photo Former Member
      Former Member

      Hi Philip,

      Thanks for you reply.

      I am using HANA<shared> database with version 1.00.102.03.1449674847

      I dropped the all table types, tables and views. Later, I recreated them.

      CONTENT.png

      This time the generator came up with following error/warning:

      SQL Warning.png

      If I check the PAL_SIGNATURE definition then it is correctly defined as per the sql statement.

      PAL_SIGNATURE.png

      Further execution of SQL statement throws following warning/error:

      CALL_PAL_PROC_MSG.png

      Cheers

      Angad

      Author's profile photo Philip MUGGLESTONE
      Philip MUGGLESTONE

      Hi Angad,

      The project was designed for HANA XS <shared>. Can you try with that? I just deleted it on my trial account and created a new one, ran the code, and all worked well. So I'm unable to reproduce the issue. Perhaps you may need to delete then recreate your HANA XS <shared> ?

      Thanks,

      Philip

      Author's profile photo Former Member
      Former Member

      My apologies. The instance is already HANA XS <shared>.

      /wp-content/uploads/2016/03/image_919782.jpeg

      Author's profile photo Former Member
      Former Member

      Hi Philip,

      I was able to get rid of warnings by replacing "DOUBLE" with "INTEGER" data type for column "influence" and varchar(100) with varchar(1000) for column "typename".

      2016-04-02 19_23_12.png

      Why Integer?

      2016-04-02 19_30_48.png

      Why varchar (1000)?

      I did not try to find it as this might be required by signature table type for HCP AFL wrapper generator.

      Regards

      Angad

      Author's profile photo Philip MUGGLESTONE
      Philip MUGGLESTONE

      Hi Angad,

      There were some changes to the structure of PAL signature tables in SPS 10 however the "old" code should still work. It's often a good idea to follow the warning messages as you have done. Changes are not explicitly documented for the HCP free developer trial where they have non-standard usage (HCP.HCP_AFL_WRAPPER_GENERATOR) however the regular PAL reference guide may be of use: Calling PAL Functions - SAP HANA Predictive Analysis Library (PAL) - SAP Library

      Regards,

      Philip

      Author's profile photo Former Member
      Former Member

      Hi Philip,

      I really appreciate your help and love the way you explain in HANA academy videos as well as on SCN. Looking forward to complete your WebIDE for HANA series.

      Cheers

      Angad