Skip to Content

Hello Folks,

We all know that SAP (NYSE:SAP) is expanding its business intelligence application offerings with new predictive analysis software that helps businesses tap into big data.


The SAP BusinessObjects Predictive Analysis package includes predictive analysis algorithms, and development and visualization tools for building predictive data models used by analytical applications. SAP is making the software available to select customers now with general availability expected in September.

I thank SAP HANA Team for keeping the PAL Library reference guide, which is self explanatory and a very good guide to start with.

For PAL guide, refer to https://www.experiencesaphana.com/docs/DOC-1402

If you are using HANA Sandbox @Cloudshare, you can access PAL Libraries. If you find any problems in accessing these PAL Libraries and if you receive error message “No ScriptServer available” then you need to implement Note: 1650957 as a work around or you can request inmemorydevcenter@sap.com to enable the script server on your Cloudshare account

Well coming to my example, we will discuss about a sample scenario in which I will use “ABC” analysis to classify the data.

For more information, refer to http://en.wikipedia.org/wiki/ABC_analysis

Now let us get back to our example:

The data set on which we are going to examine contains 150 instances, which refers to a type of plant.

  1. Number of Instances: 150
  2. Number of Attributes: 4 numeric, predictive attributes and the class
  3. Attribute Information:
  • Sepal length in cm
  • Sepal width in cm
  • Petal length in cm
  • Petal width in cm

   4. Missing Attribute Values: None

We will now try to classify/distribute 33.3% for each of 3 classes.

Now I have created a table TESTDT_TAB as shown below:

/wp-content/uploads/2012/04/1_90411.png

As per the reference guide given by SAP, below is the required information for using PAL libraries in SAP HANA.


Prerequisites:

  1. Input data cannot contain null value.
  2. The item names in the Input table must be of string data type and be unique

Interface (abcAnalysis):

Function: pal::abcAnalysis

This function performs the ABC analysis algorithm.

L Function Signature:

pal::abcAnalysis ( Table<…> target, Table<…> args, Table<…> result)

Input Table:

/wp-content/uploads/2012/04/2_90412.png

Parameter Table:

/wp-content/uploads/2012/04/3_90485.png

Output Table:

/wp-content/uploads/2012/04/4_90486.png

Now basing on the “SepalLengthCM” we will try to classify the data into 3 Classes.

As we have PAL Libraries already installed, now we need to use these Libraries and write a procedure and call it to classify the data. Execute the following SQL Script:

SQL Script:

DROP TYPE DATA_T;

CREATE TYPE DATA_T AS TABLE(“PLANT” VARCHAR(100),“SepalLengthCM” DOUBLE,“SepalWidthCM” DOUBLE,“PetalLengthCM” DOUBLE,“PetalWidthCM” DOUBLE);

DROP TYPE CONTROL_T;

CREATE TYPE CONTROL_T AS TABLE(“Name” VARCHAR(100), “intArgs” INT, “doubleArgs” DOUBLE,”strArgs” VARCHAR(100));

DROP TYPE RESULT_T;

CREATE TYPE RESULT_T AS TABLE(“ABC” VARCHAR(10),”ITEM” VARCHAR(100));

DROP PROCEDURE palAbcAnalysis;

CREATE PROCEDURE palAbcAnalysis( IN target DATA_T, IN control CONTROL_T, OUT results RESULT_T )

LANGUAGE LLANG

AS

BEGIN

export Void main(Table<String “PLANT”, Double “SepalLengthCM”,Double “SepalWidthCM”,Double “PetalLengthCM”,Double “PetalWidthCM”> “target” targetTab,

Table<String “Name”, Int32 “intArgs”, Double “doubleArgs”,String “strArgs”> “control” controlTab,

Table<String “ABC”, String “ITEM”> “results” & resultsTab) {

pal::abcAnalysis(targetTab, controlTab, resultsTab);

}

END;

DROP TABLE #CONTROL_TBL;

CREATE LOCAL TEMPORARY COLUMN TABLE #CONTROL_TBL (“Name” VARCHAR(100), “intArgs” INT, “doubleArgs” DOUBLE,”strArgs” VARCHAR(100));

INSERT INTO #CONTROL_TBL VALUES (‘START_COLUMN’,0,null,NULL);

INSERT INTO #CONTROL_TBL VALUES (‘END_COLUMN’,1,null,null);

INSERT INTO #CONTROL_TBL VALUES (‘THREAD_NUMBER’,2,null,null);

INSERT INTO #CONTROL_TBL VALUES (‘PERCENT_A’,null,0.33,null);

INSERT INTO #CONTROL_TBL VALUES (‘PERCENT_B’,null,0.33,null);

INSERT INTO #CONTROL_TBL VALUES (‘PERCENT_C’,null,0.33,null);

DROP TABLE RESULT_TBL;

CREATE COLUMN TABLE RESULT_TBL(“ABC” VARCHAR(10),”ITEM” VARCHAR(100));

CALL palAbcAnalysis(TESTDT_TAB, “#CONTROL_TBL”, RESULT_TBL) with overview;

SELECT * FROM RESULT_TBL;

Execute SQL Script by pressing “F8”.

Now we get the RESULT_TBL filled with the data which is classified as per our requirements as shown below:

/wp-content/uploads/2012/04/5_90487.png

In this example, I took a data set of 150 records (Very less amount), it took approximately 203ms for SAP HANA to analyze the data and classify it as shown above.

/wp-content/uploads/2012/04/6_90491.png

With SAP planning to offer Predictive Analysis tool set as a package combining the predictive analysis software licenses and HANA licenses. With this tool set Companies can spot fraudulent transactions and predict /forecast the sales.

With High processing capabilities coupled with high performance SAP HANA together with Predictive Analysis tool set can revolutionize the market standards and help companies run better.So am looking forward for the general availability of this tool.

Also read my blog on

SAP HANA: My Experiences with SAP Business Objects Predictive Analysis tool

http://scn.sap.com/community/in-memory-business-data-management/blog/2012/05/31/sap-hana-my-experiences-with-sap-business-objects-predictive-analysis-tool

Thanks for your time for reading this blog. Do comment your views.

To report this post you need to login first.

23 Comments

You must be Logged on to comment or reply to a post.

  1. Rama Shankar

    Krishna:

    Good write-up. Thanks for putting this together.

    One question: how does the 0.33 allocatioin work for Classification A/B/C. Can you please expand on how the allocation of plant is done in your sample.

    Rama

    (0) 
      1. Rama Shankar

        Krishna:

        No, my question was the overall logic you are trying to apply using predictive analysis.

        Let me guess:

        since you mention below for A, B and C as 0.33 – are you trying to read all 150 plants and evenly group them as 3 groups based on total: SepalLengthCM”,“SepalWidthCM”,“PetalLengthCM”,PetalWidthCM values for each plant? So, theoritically there will be 50 plants in each group A, B and C – correct?

        INSERT INTO #CONTROL_TBL VALUES (‘PERCENT_A’,null,0.33,null);

        INSERT INTO #CONTROL_TBL VALUES (‘PERCENT_B’,null,0.33,null);

        INSERT INTO #CONTROL_TBL VALUES (‘PERCENT_C’,null,0.33,null);

        (0) 
          1. varada santosh

            Hi Krishna

              I am very novice about the functionality , sorry if I am asking a basic question,I could not understand the below points.

            1.when we look at the parameter table there are 5 columns bit you created control table with 4 columns.

            2.Are you doing analysis based on any specific column or based on the number of rows.

            3.Also you mentioned in prerequisites it won’t allow NULL values.but while inserting into CONTROL_TBL you are inserting null values.

            4.For target tab which is our source as per my understanding contains two columns plant and sepallength but in the procedure while defining target tab there are 5 columns. why is it so

            5.finally there will be 3 result tables each contain 3 class record such a A,B,C.

            6.what is the essence of start column, end column,thread number.

            Please correct me if my understanding is wrong.

            can you explain the flow in little basic terms if possible.

            Thanks

            Santosh

            (0) 
          2. varada santosh

            Hi Krishna

              After going through it I understood it as below ,please correct me if I am wrong.

            We have dataset which contains 150 records and 5 columns this is input for the procedures and the control table which contains our parameters in our case the start column(0) is “PLANT” and end cloumn (1) is  “SepalLengthCM” and this procedure uses 2 threads classifiy the data based on ABC parameters.It gives us output as the classification and Plant name

            But one point which is still confusing to me is the parameter on which we are doing ABC analysis is it  “SepalLengthCM” and how does it recognize plat as Item is , does it take column 0 by default or is it possible to specify classification analysis, plant and SepalLengthCM in output

            Thanks

            Santosh

            (0) 
            1. Lalitha Swaroop Krishna Tangudu Post author

              yes and i feel that column 0 which we take must be “unique” something like primary key, i choose sepallengthCM as classifying factor and thereby gave control parameters for the sameand regarding null values, i dont have null values in my Test data as mentioned as prerequisite. Hope you understand now?

              (0) 
              1. varada santosh

                Hi Krishna

                1. Here in our above example we have start column as 0 which is a plant and end column is 1 which SepaLenghtCM, suppose if we take end column as 2 (assuming that SepalWidthCM  is also part of our input data), then what would be our classifying factor is it SepalWidthCM, in that case is it the end column which is being used for classification analysis or it is based on all the columns from start to end column in our case 0-2[Plant,SepalLengthCM,SepalWidthCM].

                2.In the output we are displaying Plant and SepalLengthCM but if we look at our result table definition it is as below

                CREATE TYPE RESULT_T AS TABLE(“ABC” VARCHAR(10),”ITEM” VARCHAR(100));

                how the system recognizes item field data is corresponding to plant values is it the starting column of our source data is treated as the item data?

                Thanks

                Santosh

                (0) 
                  1. varada santosh

                    Thanks Krishna.

                    Sorry for annoying you , left out with one more query , if the classifying factor is 0-2 columns how the plant numabrs column can be used for it as it is VARCHAR and it does not add any value to the calculation.

                    correct me if I am wrong.

                    Thanks

                    Santosh

                    (0) 
                    1. Lalitha Swaroop Krishna Tangudu Post author

                      Hello Santosh,

                      Sorry for the delayed response. I tried using “0-2” as classifying factor and yes the result is correct as mentioned by you. Thank you for analyzing and i take a point from here 🙂

                      Regards,

                      Krishna

                      (0) 
                      1. varada santosh

                        Hi

                        Thanks Krishna , with out your blog, may be I would not have not known about this ,thanks once again for sharing your wisdom.

                        To summarize .,if we say start column as 0,the first column of the input table would be displayed in output along with the table.

                        and if we say end column as 2 the classifying factor  will be 0-2 columns .

                        per-requisite is first column should be of varchar type.

                        Thanks

                        Santosh.

                        (0) 
  2. Zaib Attique Zia

    Hi Krishna,

    Thanks for sharing this.

    I would like to know more about L language. I am working on PAL with L language. Can you guide me that how i find tutorials to get understanding about L language. Its really hard to find information on L language.

    Best Regards,

    Zaib

    (0) 

Leave a Reply