Skip to Content

Part 1: Competitors, analysts and customers frequently confront SAP with a statement that NW BI (a.k.a. BW) could provide better performance if it simply employed some of the performance-enhancing features that have been added to the mainstream RDBMS software over the last few release cycles. They often point to materialized views (or summary tables), new index types (e.g. bitmaps or domain vectors), aggregation operators (like grouping sets, rollup), loaders, etc. In a serie of 3 weblogs I’d like to briefly tackle the argument cited above and proof it to be wrong. The weblogs will discuss the following 3 topics:

  1. Which are some of the DB-specific performance-enhancing features used in NW BI? (part 1)
  2. Materialized views vs. NW BI aggregates (part 2)
  3. Grouping sets and loaders (part 3)

So let’s dive into the techy part and look at some of the advanced DB features on which NW BI relies. Probably the most prominent one is the usage of Oracle’s bitmap indexing technology since BW 1.2 (based on Oracle 8.0). A typical infocube query processing plan employs 2 to 3 bitmap indexes that are merged in order to propagate the selectivity given by several filters in various dimension branches. Infocube queries that do not use bitmap indexes are likely to show poor performance and it has to be analysed why Oracle’s optimizer has not chosen to use them. For real-time infocubes using standard B-tree indexes on the F fact table (to avoid deadlocks due to the non-existing row-level locks for bitmap indexes) NW BI relies on Oracle’s B-tree-to-bitmap conversion, i.e. the conversion of standard B-tree indexes to bitmaps at query runtime.

Another good example is partitioning. NW BI uses partitioning for several purposes:

  • partition pruning: E fact tables can be range partitioned along a time characteristic.The underlying assumption is that almost every analytic query has a filter on time. Typically, it does not make sense to analyse data without defining (i.e. restricting) the time scope. In query processing terms this means that a query optimizer can focus on the partitions that hold the data and indexes relevant for the query and discard (“prune”) all the others. By the way: there are plans to extend the same concept to data store objects.
  • parallelization: Whenever table scans have to be performed on a partitioned table – independent of the type of partitioning, i.e. hash, range, list, round-robin etc. – the optimizer can use the partitioning setup to parallelize the table scans. This is, for example, useful when building up new aggregates or rolling up recently loaded requests into aggregates. Infocube compression can also benefit as well as query processing in case that partition pruning is not an option, e.g. due to a hash partitioned fact table.
  • bulk deletion: If partitions are organized in such a way that data that might form a ‘unit of deletion’, i.e. data that is likely to be deleted together then an expensive DELETE command can be translated into dropping one or more partitions. Many hand-crafted data warehouses are built on that notion, e.g. the frequently cited data warehouse at Walmart.
    In NW BI, a typical ‘unit of deletion’ is the data request. It constitutes data that has been loaded together, that originated in the same system or business process and that might be corrupted due to a failure in the communication to that source system, wrong conversions, faulty tracking or whatever can go wrong. In such a situation, the data is removed from a NW BI system and reloaded. Removing should be fast even if the data volume is huge. Therefore a fast deletion of a data request is an important operation. This is supported through the request-wise partitioning of an F fact table of an infocube.
    Similarly, bundling data along the time dimension in order to move it into an archive requires fast removal of huge amounts of data. This is a second reason (beside the one for employing partition pruning) to partition data in the E fact table of an infocube along a time characteristic.

There is a bunch of other non-standard DB features used in NW BI. It would be too much to describe them all in the same detail as the ones listed above. Others worth to be mentioned comprise DB2’s multi-dimensional clustering (MDC) which is in many ways similar to range partitioning, Oracle’s MERGE instruction for improved performance within the infocube compression, bulk INSERTs during data loads, common table expression to reduce catalog congestion, avoiding unnecessary logging and many other things.

Hopefully this weblog has provided some insight. The next two will discuss some features that look attractive initially but that have proofed to be not viable … at least up to now.

To report this post you need to login first.

6 Comments

You must be Logged on to comment or reply to a post.

  1. Shailesh Unny
    Hi,

    The info provided was definitiely enlightening. The question that springs to mind  why would SAP choose to restrict the characteristics by which the Data in the E-Fact can be partitioned. In many cases the Fiscal Month might not be a good candidate based on data volumes etc. Also in many instances the cubes themselves are split by time. In such cases a partition by some other characteristic might make more sense.

    Thanks
    Shailesh

    (0) 
    1. Thomas Zurek Post author
      Hi Shailesh,

      the E fact table is tuned towards partition pruning and as I mentioned almost every analytical query has a filter on time. It doesn’t make sense to analyze sales data without defining the years or months that should be considered. That’s why time is a good choice.

      Besides that, values for years, months, fiscal periods etc. are known in advance (and without asking a user to provide them). This is important as setting up a range partitioned table requires you to define the ranges before the actual data arrives. Sometimes that information would be easy to provide but it would still need to be explicitely provided. In other cases, it would not be easy to determine in advance what a good choice of ranges could be.

      Thirdly, it can be assumed that the load balance works well along time. If you were to partition via branches, countries or whatever you can expect significant data skew. That would trigger all kinds of demands for reoganizing the data.

      So overall, time is a very good choice, especially if we want to get away from providing a “knob” for each and every tunable thing inside our software. The critics already argue that TCO is too high and administrating our software should become easier. So here is a good example how to keep it simple.

      Regards

      Thomas

      (0) 
      1. Shailesh Unny
        Hi Thomas,

        Thanks a lot for your reply. I would agree with all your points. But there are scenarios when the fiscal period might be too huge a time period ( say for example while analysing high volume sales of music downloads) or might be too short a time period (say for summarized financial ledger data). In such cases wouldn’t having additional time characteristics help ? That said i concur with the fact that additional “knobs” might be more difficult to assimilate.

        Thanks for the info.
        Shailesh

        (0) 
      2. Witalij Rudnicki
        Hello Thomas,
        I am a bit far from low-level view on DB operations, so can make some wrong statements in my text.
        I am working with BW application for planners. We have forecasts of products’ shipments loaded every day. In case of planning data warehouses we have two time characteristics: (1)’date of forecast’ – when forecast is DONE and (2)’forecasted date’ – when particular event (shipment of given amount of the product) WILL OCCURE.
        Standard time characteristics 0CALDAY, OCALMONTH are used to store ‘forecasted date'(2). Then we need some separate characteristic for ‘date of forecast’ or as we call it ‘version’, which is 10digits char, like ‘XX20070215’. And this is a characteristic by which we would like to do partitioning, because:
        – we always select particular ‘version’ into report,
        – when remove old data from the cube – as well only particular ‘versions’ are removed (of course, it is done on load request level).

        So the basic purpose of this post was to give you an example of partitioning that might make sense for not 0CALMONTH or 0FISCPER and to ask you for your point of view of potential partitioning by ‘version’ characteristic.

        Thank you,
        Vitaliy

        (0) 
  2. John Skrabak
    Nice piece.  In a previous job, I ran an Oracle datawarehouse where I had full access to the DB’s capabilities, and at times I get frustrated with the BW environement.  But you are right it is a balancing act, if BW could support all features/options, then you will need more DBA support, and more opportunity for BW installations to make bad design decisions. SAP would then have to devote a LOT more time to the tools and interfaces trying to cover all the options.  The 80/20 rule does apply well here.

    I struggle with where the line should be sometimes.  I really missed materialized views, but have come to really like the BW’s aggregate capabilities, such as parent/child usage, ability to embed Nav Attr in the aggregate if I want to  (if it makes sense), same for hierarchies, fixed values, etc.

    Couple of tweaks I’d like to see to improve performance:

    Stars –
    We utilize Oracle’s star transformation, which embeds a STAR_TRANSFORMATION hint into the SQL.  For the large majority of queries, that works out fine, but for a our daily reconciliation reports that read everything or large chunks of the fact table, they perform better if the STAR_TRANSFORMATION hint could be suppressed and just permit a full scan of the fact table.  Any way to add a query property for that?

    Aggregate dimension table indices –
    When designing a cube, I take care to look at the characteristics that will be in a dimension and make sure those characteristics are in the cube in the order I want the nonunique index on the dimension table to be built to reduce the number of full scans of the dimension tables.  This works great, but when I go to build an aggregate, and it has a dimension table, the nonunique index is built in the reverse order (bottom up), eliminating the chance of using the index.  Seems like that needs to be changed.

    Partitioning –
    Waiting for when I can do it from the workbench for an ODS. And would like to make sure I have all the options beyond the basic range partitions.

    When I create an index for a time partitioned ODS, I would like the option to be able to specify whether the index is local or global.  A large ODS with say 120 partitions results in 120 index range scan operations against a local index vs a single range scan on a global.  Depending on the data and how users query the ODS, a global index is a better choice.  Won’t even ask about partitioning the index on another column….

    When will I be able to add partitions to an existing cube rather than having to lay them all out up front. Appears to be a function module for Informix to do this, but not one for Oracle.

    Parallel Processing –
    Would really like to have much more granular control, at an InfoSOurce/InfoProvider level rather than the broad BW wide setting.  There may be certain loads, rollups, etc., I need to go faster, but my server is not large enough to let everything loose in parallel.

    I think the real challenge for a lot of companies is the willingness to have a DBA devote enough time to learn the BI (BW) environment and work with the modelers and query builders so that models and queries can be dsigned with optimization in mind.  I think it would be an interesting survey to see how many installations have a DBA that knows the BW and works closely with the BW staff.  Usually, I think they are focused on the R3 side and never really get to learn the nuances of the BW, e.g. what influences the SQL that is generated, such as restrictions o n the ODS table or the master data table, etc.     

    Again, nice thoughtful piece.

    (0) 

Leave a Reply