Infocubes and Data Store Objects … and HANA
In recent weeks, I’ve frequently heard or read comments like “cubes will no longer be relevant in HANA” or “flat structures like data store objects (DSOs) will be sufficient”. There is no binary (= simple) answer to this, like a clear YES or NO. This blog intends to shed some light into those assumptions. You are encouraged to challenge my thoughts or to add your point of view.
First of all: analytic views in HANA are cubes. They are multidimensional abstractions of an arbitrary schema of DB tables. In other words: the concept of a cube has made it into the new world of in-memory. While this might surprise some people in marketing who have hoped that the (alleged) end-of-life of cubes would provide them with a differentiator for in-memory (vs. the old world) it should not surprise anyone who has created a multidimensional data model at one stage and in any analytic environment during the past decade.
What is important in this context is to distinguish between the conceptualand the physical aspects of a cube:
- multidimensional: dimensions (characteristics), dimension attributes (nav. attributes), hierarchies, measures (key figures)
- update via delta feeds*
- (relational) star schema
- (relational) snowflake schema
- row vs. column store
- proprietary format, like in MOLAP engines
In-memory impacts mainly the underlying physics of a cube. The conceptual setup remains valid. So, when people make comments as cited above it is fair to both, agree and disagree, depending on what you have in mind.
The HANA modeler explicitely distinguishes concept and physics via the logical view and data foundation tabstrips. TheUniverse Designer and the Information Design Tool (IDT)in BI 4.0 support a similar approach, namely to impose a multidimensional view on an arbitrary schema of tables. In BW, the approach is vice versa, namely to derive the physics (= the table schema) from the conceptual view. BPC also follows such a pattern, even more rigidly. Both approaches have their strengths and drawbacks, e.g. the flexbility to work on any given schema (HANA modeler, universes) or the ability to optimize (around a pre-defined schema) and allow write-back (BW, BPC and basically all EPM tools). In any case, the notion of a cube as both, a concept and a physical schema, exist.
Now, let’s quickly consider DSOs**. The technical term originates in BW. Still, I’m convinced that many handcrafted data warehouse have handcrafted mechanisms to do (more or less) what a DSO does, namely to translate UPDATE-based feeds (or after-image feeds) into delta feeds. In BW, a DSO can consume (after-image) loads from an extractor that is not capable to provide delta loads. One big advantage of feeding deltas into higher LSA layers is simply performance: just imagine what would happen if transformations were to be processed on all data – rather than the delta feeds – every time.
So what are the conceptual and physical aspects of a DSO:
- flat: key, (not-to-be-aggregated) attributes, measures (= attributes that can be aggregated)
- update (incl. delete, insert) via after-image feeds
- flat tables, e.g. active data – activation queue – change log
- black box objects consisting of a variety of indexes as in the HANA version that will underly BW
In conclusion, this means that the concepts (underlying cubes and DSOs) will continue to be valid and viable while in-memry will impact the physics underneath. In-memory will make those physics less of a concern in the future. Still, cubes as multidimensional perspectives on sets of data is not an inch less significant. It is straightforward from the above that the version of BW that will sit on top of HANA will still have infocubes and DSOs albeit with changed physical layouts. Stay tuned …
* The conceptual significance of deltasin analytic calculations and data flows within a data warehouse must not be underestimated and is worth a separate blog. In the context of in-memory technologies they are even significant on the physical level as they enable INSERT-only setups.
** The discussion focuses on standard DSOs and neglects other DSO types for the sake of simplicity.