Skip to Content
Author's profile photo Thomas Zurek

Infocubes and Data Store Objects … and HANA

In recent weeks, I’ve frequently heard or read comments like “cubes will no longer be relevant in HANA” or “flat structures like data store objects (DSOs) will be sufficient”. There is no binary (= simple) answer to this, like a clear YES or NO. This blog intends to shed some light into those assumptions. You are encouraged to challenge my thoughts or to add your point of view.

First of all: analytic views in HANA are cubes. They are multidimensional abstractions of an arbitrary schema of DB tables. In other words: the concept of a cube has made it into the new world of in-memory. While this might surprise some people in marketing who have hoped that the (alleged) end-of-life of cubes would provide them with a differentiator for in-memory (vs. the old world) it should not surprise anyone who has created a multidimensional data model at one stage and in any analytic environment during the past decade.

What is important in this context is to distinguish between the conceptualand the physical aspects of a cube:

  • conceptual:
    • multidimensional: dimensions (characteristics), dimension attributes (nav. attributes), hierarchies, measures (key figures)
    • update via delta feeds*
  • physical:
    • (relational) star schema
    • (relational) snowflake schema
    • row vs. column store
    • proprietary format, like in MOLAP engines

In-memory impacts mainly the underlying physics of a cube. The conceptual setup remains valid. So, when people make comments as cited above it is fair to both, agree and disagree, depending on what you have in mind.

The HANA modeler explicitely distinguishes concept and physics via the logical view and data foundation tabstrips. TheUniverse Designer and the Information Design Tool (IDT)in BI 4.0 support a similar approach, namely to impose a multidimensional view on an arbitrary schema of tables. In BW, the approach is vice versa, namely to derive the physics (= the table schema) from the conceptual view. BPC also follows such a pattern, even more rigidly. Both approaches have their strengths and drawbacks, e.g. the flexbility to work on any given schema (HANA modeler, universes) or the ability to optimize (around a pre-defined schema) and allow write-back (BW, BPC and basically all EPM tools). In any case, the notion of a cube as both, a concept and a physical schema, exist.

Now, let’s quickly consider DSOs**. The technical term originates in BW. Still, I’m convinced that many handcrafted data warehouse have handcrafted mechanisms to do (more or less) what a DSO does, namely to translate UPDATE-based feeds (or after-image feeds) into delta feeds. In BW, a DSO can consume (after-image) loads from an extractor that is not capable to provide delta loads. One big advantage of feeding deltas into higher LSA layers is simply performance: just imagine what would happen if transformations were to be processed on all data – rather than the delta feeds – every time.

So what are the conceptual and physical aspects of a DSO:

  • conceptual:
    • flat: key, (not-to-be-aggregated) attributes, measures (= attributes that can be aggregated)
    • update (incl. delete, insert) via after-image feeds
  • physical:
    • flat tables, e.g. active data – activation queue – change log
    • black box objects consisting of a variety of indexes as in the HANA version that will underly BW

In conclusion, this means that the concepts (underlying cubes and DSOs) will continue to be valid and viable while in-memry will impact the physics underneath. In-memory will make those physics less of a concern in the future. Still, cubes as multidimensional perspectives on sets of data is not an inch less significant. It is straightforward from the above that the version of BW that will sit on top of HANA will still have infocubes and DSOs albeit with changed physical layouts. Stay tuned …


* The conceptual significance of deltasin analytic calculations and data flows within a data warehouse must not be underestimated and is worth a separate blog. In the context of in-memory technologies they are even significant on the physical level as they enable INSERT-only setups.
** The discussion focuses on standard DSOs and neglects other DSO types for the sake of simplicity.

Assigned Tags

      18 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Witalij Rudnicki
      Witalij Rudnicki
      One question that remains for me: if I build my data model from scratch (not replication of ERP tables and not running BW on IMDB), like in case of 3rd party data loaded via DS,  should I model SQL tables underneath analytic view as flat structure (DSO-like) or star schema (Cube-like)? Will appreciate your thoughts on this. -Vitaliy
      Author's profile photo Thomas Zurek
      Thomas Zurek
      Blog Post Author
      Vitaliy,
      I understand your question as: if I have the freedom to choose the table layout underneath an analytic view what would I do?
      From a query performance perspective it's certainly the case that the less computation (especially joins) you have the better. But this comes at the expense of maintaining a denormalized schema. In general, HANA has its query performance strengths certainly where BWA already had them. So a simple star schema might be a good choice between query performance and maintenance tasks. In summary, it's a similar train of thought as with other RDBMS. Still the negative effects of modeling something the wrong way should be significantly reduced. This is why HANA's general value proposition is to focus on minimizing the maintenance costs (for a table schema; e.g. through a replication approach), thereby neglecting query performance considerations as the latter should still be good enough in most cases.
      Thomas
      Author's profile photo Ethan Jewett
      Ethan Jewett
      Hi Thomas,

      When I think of this I've been primarily thinking about the physical aspect, as my assumption has been that the conceptual aspects of cubes, DSOs, etc, will remain valid.

      What I don't quite understand is the idea that a star-schema approach will remain relevant in a columnar database like HANA. (I'm assuming that columnar tables will be used for analytic data.) I've always seen the star schema approach as a compromise between the ideal, totally de-normalized, analytic data model and the reality that de-normalization uses a ton of space. With HANA's columnar architecture and compression hopefully negating the space penalty for de-normalization, why wouldn't we want to model our cubes as single flat tables? Am I missing an aspect of the star-schema approach that also improves performance?

      Cheers,
      Ethan

      Author's profile photo Thomas Zurek
      Thomas Zurek
      Blog Post Author
      Hi Ethan,
      in a nutshell, I see the impact of the in-memory technology that - during data modeling - you don't have to be concerned that much about query performance but can focus on optimising the update/load setup of your structures. I'm sure that queries on a completely denormalised schema are faster in comparison to some normalisation (as in the star schema). However, the difference might be in a range that the end user does not even sense (0.6 sec might be 3 times slower than 0.2 sec but is likely perceived as the same performance).
      I see the star schema still as a good trade-off that allows to separate master from transactional data. Just imagine if the "product group" of an "product" (typically modeled as an attribute of a characteristic "product") was baked in as a column into one flat table. The granularity of the latter could be POS data of a retailer selling this product. Now if the assignment of a "product" to a "product group" changed then this could require an UPDATE on millions of rows!
      One of the most outstanding pieces of the BW model are the external hierarchies. They are highly cherished by the customers as they allow easy realignment, similar to the example above. With this, it is super easy to simulate new org structures, assignments and other similar changes.
      So in summary, I believe that master and transactional data have two separate lifecycles (updates, deletes, inserts). For that reason it makes sense to keep them separate. The penalty on query performance should be neglectable. Still, all HANA discussions and statements solely focus on the latter which - in my personal opinion - falls short of a wider analytic approach.
      I hope this helps.
      Thomas
      Author's profile photo Ethan Jewett
      Ethan Jewett
      Hi Thomas,

      Thanks, that was really helpful.

      I think when most people talk about a star schema they are talking about baking master data into the dimension tables and not about the extended star schema with conformed dimensions that BW provides (is that a "snowflake" schema?). I totally agree that the power of this approach should not be sacrificed. But it's also a real pain to maintain this sort of schema without a serious DW toolkit like BW.

      Frankly, I had not thought very hard about the BW extended star schema in the context of HANA. I just assumed your group would take care of it! I definitely take your point about the power of denormalizing master data. I'll try to roll it into that blog about OLAP schemas in relational vs. columnar databases that I promised a long time ago.

      Cheers,
      Ethan

      Author's profile photo Thomas Zurek
      Thomas Zurek
      Blog Post Author
      Hi Ethan,
      so the in-memory cubes will look basically like infocubes that have only "line-item dimensions", so the layer of (BW) dimension tables disappears. It's origin was limitations of classic RDBMS anyway. Characteristics are (mostly) the dimensions in the actual sense. Also, the E-, F-fact table split vanishes.
      Best
      Thomas
      Author's profile photo Ethan Jewett
      Ethan Jewett
      Hi Thomas,

      That makes perfect sense. Thanks for the clear explanation!

      Ethan

      Author's profile photo Witalij Rudnicki
      Witalij Rudnicki
      I assume with BW 7.3 running on HANA BW InfoCube structures still will remain the same - i.e. layer of "BW dimensions" still will be there? Do you plan to change current InfoCubes in later releases or introduce new types with characteristics-as-dimensions and removing the limit of # of dimensions and # of KFs?
      Thank you. -Vitaliy
      Author's profile photo Thomas Zurek
      Thomas Zurek
      Blog Post Author
      An in-memory infocube will still have BW dimensions, i.e. from outside you cannot spot whether it's a standard or an in-memory infocube. Those dimensions only exist logically, e.g. to group characteristics in query designer. However, physically, there won't be any (BW) dimension tables. It is similar to the use of a "dimension" in a multiprovider: it's only a concept to group related characteristics together.
      There won't be any limitations reg. # of chars or kyfs. What I mean is that characteristics are the dimensions in (non-SAP, non-BW) sense. Characteristics in BW and attribute views in HANA are conceptually pretty much the same thing. Regards
      Thomas
      Author's profile photo Witalij Rudnicki
      Witalij Rudnicki
      Thomas, I understand that already in BW7.30 SP5 in-memory DSO will have different physical structure with one delta-enabled IMDB Columnar Table and current New/Active/Log being views. And that we need to re-activate DSO to change its structure from "classic" to "in-memory". Does it mean that these new InfoCube in-memory structure will be available with the same 7.30 SP5 and all we will need is to re-activate InfoCubes after DB-migration of BW to HANA? Appreciate you are taking time to address community questions!! -Vitaliy
      Author's profile photo Thomas Zurek
      Thomas Zurek
      Blog Post Author
      Vitaliy,
      it's a bit early to discuss all details of those objects. Let me roughly describe: basically, with an infocube in BW on HANA you will have an additional check box to choose between standard physical storage (extended star schema) and in-memory optimized layout (see above). So, infocubes of both types can coexist. The same applies to DSO.
      For both objects, we are currently discussing and looking at automized or tool-supported migration options to switch from the standard physical storage to the in-memory optimized version. How that will look like, whether it's feasible, how much effort it is (for us) to implement etc. is not completely elaborated. Worst case is to manually create copies of existing infocubes (DSOs), to then set the "in-mem optimization flag" and to switch data loads to the new version. But we hope that we can be better than that.
      Author's profile photo Alexander Schuchman
      Alexander Schuchman
      While trying to plan our long term roadmap, we ran into an interesting question.
      We will clearly upgrade from BW 7.01 to BW 7.30 and then migrate our database from the current database provider to the newdb database.
      Then de-comission our existing BWA hardware attached to this BW system.

      Of course we will start to build new applications in the BI 4.0 toolset but for existing productive applications built in the 3.5 based query designer, will they get an incremental performance improvement using 7.30 with newDB vs. 7.01 and BWA?

      Author's profile photo Thomas Zurek
      Thomas Zurek
      Blog Post Author
      Hi Alex,
      BEX queries, independent from whether they have been built with QD 3.5 or QD 7.0, sit on BW infoproviders. The changes and impacts will happen "underneath" the latter. So, yes, all queries can equally benefit.
      We will start to talk more about BW 7.3 on HANA in the next few weeks. But I have indicated above that the infocube and the standard DSO will have "in-memory versions" (= physics optimized towards HANA). We plan to provide easy migration options in both cases. This currently under investigation. This means: to the outside (conceptual view) both, infocube and DSO, will look the same albeit being changed underneath.
      Thomas
      Author's profile photo Alexander Schuchman
      Alexander Schuchman
      Looking forward to being one of the first customers to go-live with BW on top of HANA!
      Author's profile photo Former Member
      Former Member
      Hi Thomas.
      BW 7.3 on HANA it sound like evolution.

      Infocube and the standard DSO will have "in-memory versions" - significant improfments of BW.

      But please don't forget about feature like an partitions in HANA.
      Infocube and the standard DSO - "in-memory versions" (for shorts - MemCubes and MemDSO) that very good, but i think it will be more faster with partitions on Cube(or DSO)-tables in HANA.

      Now in BW we can partitioning only for two time chars (Month, FiscMonth), it is not enough, very often you need more flexible partition(for example Plant/Day or CompCode/Month).
      Also in BW PSA-tables has been a very huge, but we can't make create partitions as we need.      

      I think you perfectly known about all BW minuses,
      but if we want to be ready for "billions rows world" in BW-HANA - we must use all HANA-features.

      If not in BW 7.3, may be in next release.

      Author's profile photo Former Member
      Former Member
      Cubes and Analytic Views represent the same concept: The star schema. But to me cubes are not Analytic Views. The way they are built is very different. Analytic Views need only the metadata of what is in the source system while cubes need extra data (the SID tables) not present in the source systems and transformation and (possible) consolidation. So Analytic Views are more suitable for -as you put it- "Operational reporting i.e. data from a single source, real-time, no cosolidation, no transformation"  while cubes are more suited for "Datawarehousing & Reporting..."
      HN
      Author's profile photo Witalij Rudnicki
      Witalij Rudnicki
      Hello Thomas,

      I got two additional questions, which a person form HANA's Solution Management was not able to answer for me:
      1/ How did you implement deletions from column-based tables, like during selective deletion or package deletion in InfoCube or master data clean-up?
      2/ I looked at the data model of in-memory InfoCube during TechEd in Vegas with its flat structure and only one physical P dimension. Technicaly the only difference (besides P dim) between IC's fact table and DSO's activa table, is that fact table is based on surrogate keys, and activate table is on business keys. What are the advantages of InfoCube in in-memory version over DSOs besides the fact that BEx queries require SIDs to my knowledge?
      Thank you,
      -Vitaliy

      Author's profile photo Thomas Zurek
      Thomas Zurek
      Blog Post Author
      Hi Vitaliy,
      ad 1: no idea. Need to find out myself. I speculate that the existing BW approach for deletion (if more than X % of the data is to be deleted then create a new table fill that, drop the old table, rename the new table to the old table name) might have been adjusted for that.
      ad 2: Well spotted. The SIDs are still extremely valuable for OLAP processing. Therefore having them at hand early on is an advantage, albeit a shrinking one. You are right, from a query perspective infocubes and DSOs have become very close. That's why we believe that reporting directly on DSOs will become a real option with BW-on-HANA. This would make infocubes obsolete that have a 1:1 connection to a DSO and as such would simplify the architecture. On the other hand - and that is what I've been trying to get at also in this blog: don't look at infocube vs DSO purely from a query perspective. Consider the also the update mode.
      Regards
      Thomas