Closing the loop

The CRISP-DM process has a number of stages that start with understanding the business question and data so a predictive model can be built to generate useful insights. Each phase in this process are repeated until a stable model is found that generates valuable insights consistently. The last step in this process is equally critical that embeds these models into business processes so the decision makers can get the insights to make the right decisions at the right time.

Expert Analytics is a solution that has enabled users to build complex chains iteratively and once the quality is ascertained, the nodes from the chain can be exported to HANA as stored procedure so they can be applied on new, production data. However, if the chain was complex, this export mechanism could soon become tricky since it is up to the user to export the nodes and then call them in the right order in HANA. With the release of version 3.0, it is no longer necessary to manage this complexity manually since the entire chain of nodes up to a Model Statistics or Model Compare node can be exported as a single stored procedure on HANA. This stored procedure encapsulates calls to all other stored procedures for all sub nodes guaranteeing the sequence, no matter how complex the chain.

Business case: Predictive maintenance of water distribution pipeline network

To elaborate this feature, let’s take an example of predictive maintenance on water distribution pipeline networks’ dataset that has rare failure events (around 15%). The binary target column is “Safe” which indicates the status of the network described by various features as shown in the sample below:

DataPreview.png

Building the analysis

We will solve this two class problem using different classification algorithms in Expert Analytics and pick the best model for the job. To ascertain the quality of the models, we’ll partition the data and then connect it to different algorithms. Using the Model Statistics node, we’ll then be able to interrogate the performance of the models generated that will also be used by the Model Compare node to pick the winning model. In this example, we’ll use HANA Auto Classification algorithm from APL library, HANA C 4.5 from PAL library and a hand coded LDA classification algorithm EXT LDA written in R.

ExpertChain.png

The summary report of the Model Compare node reveals the detailed statistics of the winning component.

Compare Summary.png

The visual representation makes it easier to spot the winning model, HANA Auto Classification algorithm in this case…

Compare Chart.png

Final step in the CRISP-DM process: Deployment

Now that we have our model that has reasonable performance on unseen data, it’s time to bring it closer to the business users. To being that process, lets export the entire chain up to the Model Compare node…

ExportChainFromMPC.png

The framework automatically picks the winning sub-chain and exports everything up to the Model Compare node to HANA as stored procedure. An important point is the entire chain is exported “headless”, without a dataset node so it can be hooked up to any dataset for scoring. However, the metadata definition of the training dataset is persisted in the exported artifact so it can be inspected and mapped correctly for scoring. Only nodes that save a state (model) are exported through this process so they can be used for scoring, others are automatically filtered out by the framework. For this example, let’s name this wrapper procedure “PIPELINEFAILUREDETECTION” that encapsulates the entire sub-chain up to this point.

ExportChainSP.png=>ExportSuccess.png

Verifying the export

Once the export is successful, the procedure can be found listed under “Procedures” folder in the schema it was exported to, TEST in our case…

SPinHANAStudio1.png

Double clicking the procedure name will load the content for inspection in an SQL Script window…

SPinHANAStudio2.png

Take a note of the procedure name and the generated input table type definition that should be used as reference to call the procedure with correct data definition. Now, to test the exported procedure, we can simply call the procedure and pass it in the dataset used for training that we are sure should work. Open an SQL Console from the context menu as shown below:

OpenSQLConsole.png

Now, execute a statement to call the procedure passing in the dataset to score on:

call “TEST”.“PIPELINEFAILUREDETECTION”(“TEST”.“PIPENW”, ?)

SPinHANAStudio3.png

The last two columns are showing the scored result on the dataset generated by the winning model in the chain, HANA Auto Classification.

Embedding the model chain in a business application

Any application that can call a HANA stored procedure can integrate the insights generated by this procedure. For this example, I’ll use a HANA XS Application as it’s very easy to write an application in server-side JavaScript (click here for more details) to invoke this procedure on a new dataset. Since we’ve seen how to score an entire table, let’s see how a single transaction can be scored live from a business application.

Here are some key statements required to call the procedure from within the XS application:

Get a connection handle to the HANA database.

var = $.hdb.getConnection();

Create a local, temporary table to hold the single (or multiple) line of data that can be scored:

connection.executeUpdate(‘CREATE LOCAL TEMPORARY TABLE #OBSERVATIONS LIKE “TEST”.”EXPERT_MODEL_INPUT_TYPE_426D0B055AD44EB5998E1646FBE2B2FB”‘);


Note that the table type name is in the signature of the exported stored procedure that we saw earlier.

Once we get the single record from any sources (e.g. user input), we can insert that into a temporary table so we can pass it into the stored procedure for scoring…

connection.executeUpdate(“INSERT INTO #OBSERVATIONS VALUES(?, ?, ?, …… , ?, ?)”, Density_Overload, Flow_Rate, Pipe_Length, …… , Pressure_Alarm, Inclination_Angle, Active_Stage);


Next the procedure is loaded so it can be called on the record created in a temporary table OBSERVATIONS.

var scoreProcedure = connection.loadProcedure( ‘TEST’, ‘PIPELINEFAILUREDETECTION’);


Once the procedure is loaded, it can be called with the observations passed in as the data to score.

var results = scoreProcedure (‘#OBSERVATIONS’);


Rows from the results returned from the procedure call can now be used in the business application to embed the insights generated by the model chain exported from Expert Analytics.

var rows = results.RESULT;


Sample Application: Predictive Maintenance Dashboard

Logon to XS application on HANA

PMDB1.png

Click on the “Predict Failure” to call the model chain stored procedure with one record generated from input control (sliders)

PMDB2.png

All nodes in the chain will process this single record (or table where applicable) and return the prediction as shown below:

PMDB3.png

Assuming these sliders were driven by various sensors in the pipeline network, a failure could be detected in advance with a confidence indicator to allow making decisions in time and preventing failure.

PMDB4.png

To report this post you need to login first.

9 Comments

You must be Logged on to comment or reply to a post.

    1. Jayanta Roy Post author

      That’s correct Antoine,

      In this example the winning model comes from the HANA Auto Classification which is the winner.

      Now, if you can imagine a more complex chain with a few more algorithm components on that sub-chain, all their models will be exported in the same sequence, wrapped up by the final procedure name that you can provide on the export dialog.

      This wrapper procedure will call all the models in that sub-chain in the right order on the new dataset when invoked (as in the example above).

      (0) 
  1. Ted Kwon

    Hi Jayanta,

    Appreciate for your detailed introduction in developing predictive models (discrimination model between binary choices).

    PA module is sure to have strong integration capabilities covering APL, PAL, LDA algorithm altogether

    – APL and PAL seems to be non-normative logic (LDA is like normative logic which more relies on statistical assumption and restriction for how source data has been generated

    – Adopting analogy, example above has reminded me of stepwise regression model building in SAS

    Please correct me if there is short understanding.

    One question: How far APL is different from PAL ?, Is there any business case where PAL would show better fitting ?

    Regards, Ted

    (0) 
    1. Marc DANIAU

      Hi Ted, APL uses an SAP’s proprietary algorithm; it takes care of data preparation steps automatically (e.g. normalize, binning, grouping, missing values, outliers) and chooses the best model for you. PAL offers a number of standard algorithms and statistical functions that the user can assemble the way he wants. You expect the PAL user to be a data-mining expert who knows how to select the appropriate algorithm and how to parameterize it in order to best answer a given business question. The APL engine is designed to bring data-mining functions to a broader population of users that is not limited to experienced data scientists. APL requires less time and effort to address a business problem but it is not suitable for building a fully customizable solution that one can build with PAL or R or any machine-learning tool. A user who decided to run a standard algorithm because he knows it very well from his practice or for any other reason will go for PAL or R, not APL. APL provides classification only for binary targets (1/0, Yes/No, Working/Broken) ; If you have a multi-class target problem you need to use  PAL or R.

      (0) 
      1. Marc DANIAU

        For a multi-class problem, if you want to leverage APL you can work using several binary targets derived from the original target ; for instance if the target contains, say, 3 values : A, B, C , you can use three new columns : Is_A, Is_B, Is_C that contains 2 values Yes/No. Here is another article that might interest you: http://scn.sap.com/docs/DOC-70168

        (0) 
      2. Ted Kwon

        Hi Marc

        Thank for update and clarification on me.

        APL – business user & analyst friendly, PAL/R – data miner & scientist friendly

        One solution can’t match for all problems. Specific use case and innate data quality would decide right approach(tools).  for taking a few example, SPSS is proud of user friendly UI/menu navigation and relatively strong in marketing research.

        SAS provides clear algorithm and statistical rigidness for Categorical data analysis & visualisation .

        SAP Predictive analytics has strong connection with HANA and BI tools in the world.

        I believe that the best predictive analysis is the result of combined effort between business, analyst and scientist. SAP Predictive Analytics on SAP HANA would be good (or best) workplace in this regards. Ted

        (0) 
  2. Alejandro Serrano

    Hi,

    I think you may be able to help me. In my case I have a table in BW. I want to access to this table, pre-process it and apply a AFL predictive function (Single Exponential Smoothing). Then, I want to store the results and access them from BW. After some research, I managed to find 5 different ways to approach this problem:

    1. Using SQLScript from SAP HANA Studio
    2. Using Application Function Modeller (AFM) from SAP HANA Studio
    3. Using SAP Predictive Analytics (SAP PAA)
    4. Use BW report (ABAP) to trigger a SQL PROCEDURE
    5. Using Predictive Algorithms native in BW

    In all cases I got stack at some point. Please, have a look to https://answers.sap.com/questions/148420/five-approaches-to-execute-a-predictive-afl-functi.html.

    Thanks and regards!

    (0) 

Leave a Reply