SAP HANA: My experiences on using Predictive Analysis Toolset with HANA
Hello Folks,
We all know that SAP (NYSE:SAP) is expanding its business intelligence application offerings with new predictive analysis software that helps businesses tap into big data.
The SAP BusinessObjects Predictive Analysis package includes predictive analysis algorithms, and development and visualization tools for building predictive data models used by analytical applications. SAP is making the software available to select customers now with general availability expected in September.
I thank SAP HANA Team for keeping the PAL Library reference guide, which is self explanatory and a very good guide to start with.
For PAL guide, refer to https://www.experiencesaphana.com/docs/DOC-1402
If you are using HANA Sandbox @Cloudshare, you can access PAL Libraries. If you find any problems in accessing these PAL Libraries and if you receive error message “No ScriptServer available” then you need to implement Note: 1650957 as a work around or you can request inmemorydevcenter@sap.com to enable the script server on your Cloudshare account
Well coming to my example, we will discuss about a sample scenario in which I will use “ABC” analysis to classify the data.
For more information, refer to http://en.wikipedia.org/wiki/ABC_analysis
Now let us get back to our example:
The data set on which we are going to examine contains 150 instances, which refers to a type of plant.
- Number of Instances: 150
- Number of Attributes: 4 numeric, predictive attributes and the class
- Attribute Information:
- Sepal length in cm
- Sepal width in cm
- Petal length in cm
- Petal width in cm
4. Missing Attribute Values: None
We will now try to classify/distribute 33.3% for each of 3 classes.
Now I have created a table TESTDT_TAB as shown below:
As per the reference guide given by SAP, below is the required information for using PAL libraries in SAP HANA.
Prerequisites:
- Input data cannot contain null value.
- The item names in the Input table must be of string data type and be unique
Interface (abcAnalysis):
Function: pal::abcAnalysis
This function performs the ABC analysis algorithm.
L Function Signature:
pal::abcAnalysis ( Table<…> target, Table<…> args, Table<…> result)
Input Table:
Parameter Table:
Output Table:
Now basing on the “SepalLengthCM” we will try to classify the data into 3 Classes.
As we have PAL Libraries already installed, now we need to use these Libraries and write a procedure and call it to classify the data. Execute the following SQL Script:
SQL Script:
DROP TYPE DATA_T;
CREATE TYPE DATA_T AS TABLE(“PLANT” VARCHAR(100),“SepalLengthCM” DOUBLE,“SepalWidthCM” DOUBLE,“PetalLengthCM” DOUBLE,“PetalWidthCM” DOUBLE);
DROP TYPE CONTROL_T;
CREATE TYPE CONTROL_T AS TABLE(“Name” VARCHAR(100), “intArgs” INT, “doubleArgs” DOUBLE,”strArgs” VARCHAR(100));
DROP TYPE RESULT_T;
CREATE TYPE RESULT_T AS TABLE(“ABC” VARCHAR(10),”ITEM” VARCHAR(100));
DROP PROCEDURE palAbcAnalysis;
CREATE PROCEDURE palAbcAnalysis( IN target DATA_T, IN control CONTROL_T, OUT results RESULT_T )
LANGUAGE LLANG
AS
BEGIN
export Void main(Table<String “PLANT”, Double “SepalLengthCM”,Double “SepalWidthCM”,Double “PetalLengthCM”,Double “PetalWidthCM”> “target” targetTab,
Table<String “Name”, Int32 “intArgs”, Double “doubleArgs”,String “strArgs”> “control” controlTab,
Table<String “ABC”, String “ITEM”> “results” & resultsTab) {
pal::abcAnalysis(targetTab, controlTab, resultsTab);
}
END;
DROP TABLE #CONTROL_TBL;
CREATE LOCAL TEMPORARY COLUMN TABLE #CONTROL_TBL (“Name” VARCHAR(100), “intArgs” INT, “doubleArgs” DOUBLE,”strArgs” VARCHAR(100));
INSERT INTO #CONTROL_TBL VALUES (‘START_COLUMN’,0,null,NULL);
INSERT INTO #CONTROL_TBL VALUES (‘END_COLUMN’,1,null,null);
INSERT INTO #CONTROL_TBL VALUES (‘THREAD_NUMBER’,2,null,null);
INSERT INTO #CONTROL_TBL VALUES (‘PERCENT_A’,null,0.33,null);
INSERT INTO #CONTROL_TBL VALUES (‘PERCENT_B’,null,0.33,null);
INSERT INTO #CONTROL_TBL VALUES (‘PERCENT_C’,null,0.33,null);
DROP TABLE RESULT_TBL;
CREATE COLUMN TABLE RESULT_TBL(“ABC” VARCHAR(10),”ITEM” VARCHAR(100));
CALL palAbcAnalysis(TESTDT_TAB, “#CONTROL_TBL”, RESULT_TBL) with overview;
SELECT * FROM RESULT_TBL;
Execute SQL Script by pressing “F8”.
Now we get the RESULT_TBL filled with the data which is classified as per our requirements as shown below:
In this example, I took a data set of 150 records (Very less amount), it took approximately 203ms for SAP HANA to analyze the data and classify it as shown above.
With SAP planning to offer Predictive Analysis tool set as a package combining the predictive analysis software licenses and HANA licenses. With this tool set Companies can spot fraudulent transactions and predict /forecast the sales.
With High processing capabilities coupled with high performance SAP HANA together with Predictive Analysis tool set can revolutionize the market standards and help companies run better.So am looking forward for the general availability of this tool.
Also read my blog on
SAP HANA: My Experiences with SAP Business Objects Predictive Analysis tool
Thanks for your time for reading this blog. Do comment your views.
Hi Krishna Boss,
great blog
Petr.
Hello Petr
Thank you
Regards,
Kris
Krishna:
Good write-up. Thanks for putting this together.
One question: how does the 0.33 allocatioin work for Classification A/B/C. Can you please expand on how the allocation of plant is done in your sample.
Rama
Rama:
I have mentioned in the Control table, 0.33 Percent for the split up. As you can see "INSERT INTO #CONTROL_TBL VALUES ('PERCENT_A',null,0.33,null);"
Is this what you are asking?
Krishna:
No, my question was the overall logic you are trying to apply using predictive analysis.
Let me guess:
since you mention below for A, B and C as 0.33 - are you trying to read all 150 plants and evenly group them as 3 groups based on total: SepalLengthCM","SepalWidthCM","PetalLengthCM",PetalWidthCM values for each plant? So, theoritically there will be 50 plants in each group A, B and C - correct?
INSERT INTO #CONTROL_TBL VALUES ('PERCENT_A',null,0.33,null);
INSERT INTO #CONTROL_TBL VALUES ('PERCENT_B',null,0.33,null);
INSERT INTO #CONTROL_TBL VALUES ('PERCENT_C',null,0.33,null);
Yes exactly 🙂
Hi Krishna
I am very novice about the functionality , sorry if I am asking a basic question,I could not understand the below points.
1.when we look at the parameter table there are 5 columns bit you created control table with 4 columns.
2.Are you doing analysis based on any specific column or based on the number of rows.
3.Also you mentioned in prerequisites it won't allow NULL values.but while inserting into CONTROL_TBL you are inserting null values.
4.For target tab which is our source as per my understanding contains two columns plant and sepallength but in the procedure while defining target tab there are 5 columns. why is it so
5.finally there will be 3 result tables each contain 3 class record such a A,B,C.
6.what is the essence of start column, end column,thread number.
Please correct me if my understanding is wrong.
can you explain the flow in little basic terms if possible.
Thanks
Santosh
Hi Krishna
After going through it I understood it as below ,please correct me if I am wrong.
We have dataset which contains 150 records and 5 columns this is input for the procedures and the control table which contains our parameters in our case the start column(0) is "PLANT" and end cloumn (1) is "SepalLengthCM" and this procedure uses 2 threads classifiy the data based on ABC parameters.It gives us output as the classification and Plant name
But one point which is still confusing to me is the parameter on which we are doing ABC analysis is it "SepalLengthCM" and how does it recognize plat as Item is , does it take column 0 by default or is it possible to specify classification analysis, plant and SepalLengthCM in output
Thanks
Santosh
yes and i feel that column 0 which we take must be "unique" something like primary key, i choose sepallengthCM as classifying factor and thereby gave control parameters for the sameand regarding null values, i dont have null values in my Test data as mentioned as prerequisite. Hope you understand now?
Hi Krishna
1. Here in our above example we have start column as 0 which is a plant and end column is 1 which SepaLenghtCM, suppose if we take end column as 2 (assuming that SepalWidthCM is also part of our input data), then what would be our classifying factor is it SepalWidthCM, in that case is it the end column which is being used for classification analysis or it is based on all the columns from start to end column in our case 0-2[Plant,SepalLengthCM,SepalWidthCM].
2.In the output we are displaying Plant and SepalLengthCM but if we look at our result table definition it is as below
CREATE TYPE RESULT_T AS TABLE("ABC" VARCHAR(10),"ITEM" VARCHAR(100));
how the system recognizes item field data is corresponding to plant values is it the starting column of our source data is treated as the item data?
Thanks
Santosh
Hello Santosh,
Answers:
1) yes 0-2
2) Yes you are correct, it is taking the first column of my source data
Regards,
Krishna
Thanks Krishna.
Sorry for annoying you , left out with one more query , if the classifying factor is 0-2 columns how the plant numabrs column can be used for it as it is VARCHAR and it does not add any value to the calculation.
correct me if I am wrong.
Thanks
Santosh
Hello Santosh,
Sorry for the delayed response. I tried using "0-2" as classifying factor and yes the result is correct as mentioned by you. Thank you for analyzing and i take a point from here 🙂
Regards,
Krishna
Hi
Thanks Krishna , with out your blog, may be I would not have not known about this ,thanks once again for sharing your wisdom.
To summarize .,if we say start column as 0,the first column of the input table would be displayed in output along with the table.
and if we say end column as 2 the classifying factor will be 0-2 columns .
per-requisite is first column should be of varchar type.
Thanks
Santosh.
Yes absolutely santosh....nicely summarized.
Do "like" the blogs u feel interested it will certainly help to boost the zeal to contribute more.
Regards,
Krishna Tangudu
Thanks for Prompt reply ,Looking for more blogs from you.
Regards
Santosh
Thanks Krishna!
Welcome Shankar 🙂
Hi Krishna,
Thanks for sharing this.
I would like to know more about L language. I am working on PAL with L language. Can you guide me that how i find tutorials to get understanding about L language. Its really hard to find information on L language.
Best Regards,
Zaib
Hi Krishna,
Good Blog
Regards,
Vivek
Hi Krishna,
Very helpful 🙂
Regards,
Akshay
Hi Krishna
Great and helpful blog.
Regards
Kumar 🙂
Very good for me.
Is it possible integrate L Lang. or R lang with Application Function Modular (AFM)?
Thanks
Somnath Kadam