Skip to Content
Author's profile photo Kurt Holst

Using data mining best practices to ensure optimal predictive flow

Hi,

This video illustrates an example of how to build an end-to-end machine learned model using SAP Predictive Analysis.

Furthermore the video walks you through the aspect of training your model with respect to BIAS in your data.

The effect of incorrect sampling data from a BIAS sorted dataset is demonstrated. The dataset is based on the well known IRIS that is provided with R. Let me know if you would like a copy of the dataset so that you can try this yourself.

Finally the machine trained model is then applied to new data in order to perform predictions.

Sampling data is as illustrated in the video highly influent on the outcome of your data mining model.

In order to reduce bias in data how would you ensure that your data is picked random in both samples and not reused?

Looking at SAP Predictive Analysis the options for sampling are: First N, Last N, Every N, Simple Random or Systematic Random?

To reduce the risk of over-learning one must make sure that data are not reused across training, testing or validation.

That goes especially if data is sorting and hence would introduce a bias in the result.

  Sampling data for training.png

Best regards,

Kurt Holst

Assigned Tags

      1 Comment
      You must be Logged on to comment or reply to a post.
      Author's profile photo Former Member
      Former Member

      Hi Kurt,

      Nice one.

      Thanks. Using SAP Lumira and Excel Data sources, I have generated some of the Graphically represented figures.

      Data Geek Challenge: Rajinikanth: Great Living Indian in India Cinema :

      http://scn.sap.com/community/lumira/blog/2013/09/09/super-star-rajinikanth

      Data Geek Challenge - Sachin Tendulkar Using SAP Lumira:

      http://scn.sap.com/community/lumira/blog/2013/09/01/data-geek-challenge--sachin-tendulkar-using-sap-lumira

      Regards,
      Hari Suseelan