Closing the loop
The CRISP-DM process has a number of stages that start with understanding the business question and data so a predictive model can be built to generate useful insights. Each phase in this process are repeated until a stable model is found that generates valuable insights consistently. The last step in this process is equally critical that embeds these models into business processes so the decision makers can get the insights to make the right decisions at the right time.
Expert Analytics is a solution that has enabled users to build complex chains iteratively and once the quality is ascertained, the nodes from the chain can be exported to HANA as stored procedure so they can be applied on new, production data. However, if the chain was complex, this export mechanism could soon become tricky since it is up to the user to export the nodes and then call them in the right order in HANA. With the release of version 3.0, it is no longer necessary to manage this complexity manually since the entire chain of nodes up to a Model Statistics or Model Compare node can be exported as a single stored procedure on HANA. This stored procedure encapsulates calls to all other stored procedures for all sub nodes guaranteeing the sequence, no matter how complex the chain.
Business case: Predictive maintenance of water distribution pipeline network
To elaborate this feature, let’s take an example of predictive maintenance on water distribution pipeline networks’ dataset that has rare failure events (around 15%). The binary target column is “Safe” which indicates the status of the network described by various features as shown in the sample below:
Building the analysis
We will solve this two class problem using different classification algorithms in Expert Analytics and pick the best model for the job. To ascertain the quality of the models, we’ll partition the data and then connect it to different algorithms. Using the Model Statistics node, we’ll then be able to interrogate the performance of the models generated that will also be used by the Model Compare node to pick the winning model. In this example, we’ll use HANA Auto Classification algorithm from APL library, HANA C 4.5 from PAL library and a hand coded LDA classification algorithm EXT LDA written in R.
The summary report of the Model Compare node reveals the detailed statistics of the winning component.
The visual representation makes it easier to spot the winning model, HANA Auto Classification algorithm in this case…
Final step in the CRISP-DM process: Deployment
Now that we have our model that has reasonable performance on unseen data, it’s time to bring it closer to the business users. To being that process, lets export the entire chain up to the Model Compare node…
The framework automatically picks the winning sub-chain and exports everything up to the Model Compare node to HANA as stored procedure. An important point is the entire chain is exported “headless”, without a dataset node so it can be hooked up to any dataset for scoring. However, the metadata definition of the training dataset is persisted in the exported artifact so it can be inspected and mapped correctly for scoring. Only nodes that save a state (model) are exported through this process so they can be used for scoring, others are automatically filtered out by the framework. For this example, let’s name this wrapper procedure “PIPELINEFAILUREDETECTION” that encapsulates the entire sub-chain up to this point.
Verifying the export
Once the export is successful, the procedure can be found listed under “Procedures” folder in the schema it was exported to, TEST in our case…
Double clicking the procedure name will load the content for inspection in an SQL Script window…
Take a note of the procedure name and the generated input table type definition that should be used as reference to call the procedure with correct data definition. Now, to test the exported procedure, we can simply call the procedure and pass it in the dataset used for training that we are sure should work. Open an SQL Console from the context menu as shown below:
Now, execute a statement to call the procedure passing in the dataset to score on:
call “TEST”.“PIPELINEFAILUREDETECTION”(“TEST”.“PIPENW”, ?)
The last two columns are showing the scored result on the dataset generated by the winning model in the chain, HANA Auto Classification.
Embedding the model chain in a business application
Here are some key statements required to call the procedure from within the XS application:
Get a connection handle to the HANA database.
var = $.hdb.getConnection();
Create a local, temporary table to hold the single (or multiple) line of data that can be scored:
connection.executeUpdate(‘CREATE LOCAL TEMPORARY TABLE #OBSERVATIONS LIKE “TEST”.”EXPERT_MODEL_INPUT_TYPE_426D0B055AD44EB5998E1646FBE2B2FB”‘);
Note that the table type name is in the signature of the exported stored procedure that we saw earlier.
Once we get the single record from any sources (e.g. user input), we can insert that into a temporary table so we can pass it into the stored procedure for scoring…
connection.executeUpdate(“INSERT INTO #OBSERVATIONS VALUES(?, ?, ?, …… , ?, ?)”, Density_Overload, Flow_Rate, Pipe_Length, …… , Pressure_Alarm, Inclination_Angle, Active_Stage);
Next the procedure is loaded so it can be called on the record created in a temporary table OBSERVATIONS.
var scoreProcedure = connection.loadProcedure( ‘TEST’, ‘PIPELINEFAILUREDETECTION’);
Once the procedure is loaded, it can be called with the observations passed in as the data to score.
var results = scoreProcedure (‘#OBSERVATIONS’);
Rows from the results returned from the procedure call can now be used in the business application to embed the insights generated by the model chain exported from Expert Analytics.
var rows = results.RESULT;
Sample Application: Predictive Maintenance Dashboard
Logon to XS application on HANA
Click on the “Predict Failure” to call the model chain stored procedure with one record generated from input control (sliders)
All nodes in the chain will process this single record (or table where applicable) and return the prediction as shown below:
Assuming these sliders were driven by various sensors in the pipeline network, a failure could be detected in advance with a confidence indicator to allow making decisions in time and preventing failure.