For as long as I’ve been working in Business Intelligence and Warehousing, there has really been only 2 trains of thought on how to approach a Data Warehouse; Kimball or Inmon. At a high level, the key differences in approach are; Kimball proposes that we build from the ground up, and Inmon advocates a top down approach. That’s clearly a generalised statement of the differences, and not a statement designed to inspire debate. I’ve seen both approaches in action and both have their pluses and minuses. The topic of debate here is not focused on their differences, but if the theory of both approaches are still valid with respect to current advances in data warehousing.
The core of their design is to provide an efficient method of data storage and retrieval. At the time of design, memory and storage were both expensive, which led to the use of Data Marts and aggregated data as a method of minimising the amount of data that is stored within a data warehouse.
This is the key problem with Data Marts; they are design for aggregated data. The vast majority of users that I have spoken to during requirements gathering exercises respond with an all too familiar statement when asked what their requirements are; “we want to report on everything” (breadth). When asked how much of everything they would like to hold; the next response is “everything of everything” (depth). From a user perspective, that’s a fair enough statement. Why shouldn’t they be allowed to report on everything to make better business decisions.
To accommodate the requirement of breadth, star schema’s evolve into snowflake scheme’s and multiple Data Mart’s are created. To store the level of depth required for reporting, Data Marts would hold line item data resulting in massive fact tables and performance problems, or the data would be stored in the ODS with clever ways devised to drill from a data mart into an ODS.
Although not perfect, the theory has held well considering the barriers faced. The barriers that led to the design of data warehousing are rapidly falling down. Data Storage and Memory are rapidly decreasing in value and has been for a long time. What’s more relevant is the maturity of new models and approaches to business intelligence and warehousing. To name a few, in recent times, we have seen the rise and maturity of:
These new models and approaches do not clearly fit into the theory of data warehousing as we know it. My question is, should the theory of Kimball and Inmon be updated for the modern advances in data warehousing or do we need a fresh new approach to data warehousing?