This blog looks at the situation of a customer who migrated his hand-crafted Oracle data warehouse to SAP BW. It is a tangible, real-world case that contradicts many of the alleged pros and cons of the differing approaches to data warehousing. Beside TCO, the SQL Access to BW via Data Federator allows even for a surprising 1:1 comparison regarding query performance.
There is a long running, almost religious debate on how to build up a data warehouse (DW), either using a DW management application – like SAP BW – or using a set of loosely coupled tools and manually crafting the management of that DW with an RDBMS as the central infrastructure piece. The latter often dilutes the difference between an RDBMS software suitable for high data volumes and a DW. Examples are the traditional RDBMS vendors like Oracle, Microsoft, IBM and Sybase but also the upcoming and/or specialized vendors like Teradata, Greenplum, Netezza or Vertica. Fundamentally, they offer an RDBMS. In order to build up a DW this RDBMS needs to be supplemented by a management software that is based on ETL and other tools (e.g. for data enrichment, data quality etc.). That software manages challenges like
- Semantically correct extraction: Just moving the contents of a table in the operational system to the DW is often not sufficient; more tables need to be involved and not even a big join does the trick as data gets semantically enriched through application code; SAP ERP is a (typical) example of that – just consider the many infosets.
- Coordination and scalability of data movement processes: That includes necessary transformations and locking. Just consider such trivial examples like normalizing dates such as 14.4.2009, 04/14/09, 14 Apr 09, … to a uniform representation like 20090414. Locking is necessary to protect that mass data movements expose or create inconsistent states in the warehouse.
- Physical layout: Data needs to be stored somewhere and someone has to decide and manage tables or sets of tables, degree of normalization, organization and isolation of dimensions (which might then lead to stars or snowflakes) etc.
- Consistency of the data: This is typically achieved by a transactional concept built on top of the DW entities which – in turn – maps to DB transactions.
In summary: the DW is the combination of an RDBMS with a DW management software. The latter can be created proprietarily (via consultant projects and/or via evolution over time) or a standard package such as SAP BW can be used. It is obvious that non-standard, hand-crafted approaches are likely to imply a high TCO as every change (to a data source or to a model) requires expert programmers to adjust the software. On the bonus side, it is often argued that a manually managed DW can be tailored to the individual needs and that there is so many more options that one can use, especially for query performance tuning.
One of SAP’s customers ran such a hand-crafted DW based on an Oracle RDBMS. Many of source systems were operational SAP systems. Extraction was proprietary. The system started small and grew over the years and with that grew the management complexity and consequently the TCO. In order to tackle the latter, BW was considered, particularly for the DW management advantages. End users accessed the RDBMS via Business Objects’s WebI using SQL as the query language. With the switch to BW the customer wanted to keep WebI as the frontend tool for his end users.
Considering SQL Access to BW via Data Federator there was suddenly a 1:1 comparison between (productive) queries running on an Oracle RDBMS versus the same queries running on a BW system that Understanding Query Performance in NW BI and BIA. The result can be seen in the chart below; each query is represented by an individual color. While the overall superior performance of BW and BWA is evident in the chart, it is important to note that fast queries under Oracle remain fast under BW/BWA and that slow queries under Oracle become fast under BW/BWA. The overall variance of query runtimes is minimal under BW/BWA leading to a stable performance pattern. Finally, the history of the compared systems suggests that the Oracle DW has received many many hours of manual tuning over the years while the BW/BWA system has hardly received any tuning. So, even here, the TCO aspect is apparent.
This is just an instance that provides the rare opportunity of directly comparing two widely differing approaches. In my opinion, this example also questions the value and the significance of assessing data warehouses by (only) its underlying RDBMS. See Gartner’s magic quadrant on DW DBMS as an instance of such an analysis. Here, for this customer, the RDBMS does not make the difference: his BW also runs on top of an Oracle RDBMS.