Skip to Content

Should data mining be a part of a datawarehousing tool

Data Mining … we keep hearing about it so often … but then what is all the fuss about data mining…?

Imagine this … you are a consumer company selling widgets for the past 20 years and have about 5 years of sales data with you with a reasonable amount of data cleanliness.

A product variant is being launched with much fanfare and a lot of money spent on advertising. The budget committee is concerned that the spends are on the higher side and wants to know how the company has favored in similar launches and the sales growth due to a promotion and the amount of sales that can be attributed to a particular form of advertising…

Very pertinent question….. but what does it bode for us…?

Scenario 1: You call in the data mining experts – who are they ?
typically domain experts who understand the market dynamics very well having worked in the same field for years.

The number crunchers – they who can bring sense out of data

These are the people who work with the domain experts to see how they can arrive at the required answer using complicated models , some statistics , some stochastic , some causal etc etc… jargon to many…!!

Scenario 2: The data warehousing team is asked to provide some answers to the question since they best know the data that they store….

Interesting scenario … what do you do ?

Lets take the following scenario in SAP BI….

The first answer that you get is “Go to TCODE RSANWB” – ( Have come across some such answers in the forums!!!)

what is this – this is the Analysis Process Designer which ‘also’ has data mining capabilities.

But then the dilemma surrounding data mining in a data warehouse environment is that there is a vast skill set mismatch.

The person who owns / runs the data warehouse understands the data that is in there but not necessarily all the information in there. The architect knows the data models and relationship but then the knowledge brought in by the data mining modeler is absent. For instance I know the data resides in the Sales order Line Item cube… but then I have no clue as to whether I have to use a cluster or a decision tree or a regression etc.

Statistics is not my cup of tea. And even if I do decide to try something out … I get answers which I cannot make sense out of. I can only present the results to the person who asked for it and hope that this is what they wanted….

Data Mining moreover has become a buzzword with so many companies setting up shop in the field of Analytics and specialized niche domains. Due to this each company tries to distinguish itself on factors of complex data mining models and domain expertise.

One thing that gets lost in the detail is the simpler data models for data mining. We used to do a T test , F test on excel and on paper … such simple tests could still be done in SAP BI but then when the same becomes covered , dressed up in the form of a data mining model .. the meaning gets lost.

This maybe one of the reasons why data mining in SAP BI or any data warehouse

What do you think ? should data mining be a part of mainstream data warehousing or be left to the domain experts and the analytics experts to figure out ?

You must be Logged on to comment or reply to a post.
  • Data mining is a more complex topic than reporting. Most business users cannot even formulate the problem correctly for some one in the BI team to arrive at an answer.
    Also, most BI data is not modelled for efficient mining. And, you need specially trained statistician type people to do this
    For all the good things it can do, some times the hype is far from truth. For example, if you try to analyze whether a certain promotion was behind the success of a product – it is hard. Sales and marketing have a certain lag in time, and do not even share the same characteristics for an apple to apple comparison. For example, sales happen at a material number level, and marketing can happen at an arbitary level of the product hierarchy. Such situations can also be modelled in mining, but several probabilities and fudge factors will result in answers that do nat have sufficient statistical ‘confidence’.
    • I was trying to address the fact that organizations know that data mining tools exist but then are not keyed to use the same.
      And in such a scenario … do we need to provide such tools at all ?
      In most cases flat file dumps are taken and sent to analytics providers to work on the same and give the results.
      But then also wanted to know if the same situation exists across projects and if possible understand if other tools provide data mining capabilities or it is just something that SAP alone offers but is seldom used.
      • Personally, I have not seen my clients take to mining in a big way, and even those that do – they don’t use SAP for this purpose.

        SPSS type tools are what statisticians use in academics (I remember taking an introductory class when I did my MBA), and they are very specialized.

        I do not think SAP should bother to compete with them in short term due to the small footprint.

        • Precisely my point also … I also called up T Test , etc from possibly the same introductory class in my MBA!!!
          it becomes something like a gadget … you initially use the various features of the gadget to sell it but then it is ultimately used for its core purpose … you never hear of a phone that is more used as an MP3 player than as a phone , it can also be used as an MP3 player but very few or none would buy a phone to use it as an MP3 Player…

          something like that .. ultimately the tool comes to be used for what it is meant to do and the additional accessorizing features might be used but not as a critical and most used feature.

          and leads me to think … how do you differentiate between similar such tools – should it be only performance / pedigree or the features available…?
          my 0.02

  • I actually have the pleasure of being a statistician and BI analyst, and have used analysis process designs in many scenarios. However, the mining and analysis options in the ‘Enhanced Analytics’ take too much time for a true analyst to utilize.

    The major issue with these tools is that you must be able to utilze them in a production enviroment, unless you have signficant data in your other enviroments. Secondly, I’ve encountered numerous performance issues that are mostly discovered when moving to production where there is a significant amount more data than our BWQ eniviroment. Lastly, due to all the aforementioned factors and the lack of help documentation & examples on how to utlize each analysis feature, too much time is lost for an analyst who is trying to solve a problem.

    Depending on the particular analyst’s abilities, I would recommend a statistical package like SAS, SPSS, S Plus (or R is free), for quicker analytics off of query results.

    Overall though, the analysis process design is fabulous for creating new DSO/Cubes for pre-calculated aggregated data used for metrics and dashboards.