Skip to Content
Author's profile photo Arun Varadarajan

Twinkle Twinkle little STAR , do I really need you at all..?

Initially Enterprise Data Warehousing (EDW) was all about cubes , datamarts , STAR Schemas and designing them properly for good performance… A lot of thought went into designing the cubes and making sure that the cube tables were optimized and the usual checks for performance where we compared the size of the FACT table with the Dimension tables to make sure that we did not go over the prescribed 20% limit unless it was inevitable..

Now , enter the Hardware accelerator which uses In-Memory computing and compresses the data in our cubes greatly to help speed up reporting and also makes it easy for designers to design cubes because ultimately the columnar processing of the accelerator would kick in when all else fails…

Taking the case of SAP BW…
Does this mean …
a. Cube design is no longer critical for query performance ..?
Since the BWA is going to index the columns and compress the data , if we had very poorly designed cubes with big fat dimension tables , the BWA would be able to handle the same since it sees each column differently.

In a scenario , lets assume that we have a cube with a dimension table which is 50% of the fact table – either there are too few entries in the Fact Table ( Deletion of requests without removing dimension entries ..) or the design is not very good leading to too many entries in the Dimension table. In the BWA – since the columnar indexing will take care of such duplication , should this design gaffe be considered as forgivable..?

Even if we do take the effort of setting right the cube design by distributing the values properly across dimension tables , in the BWA scenario , this should give us a performance benefit which is hardly noticeable because the original query itself runs very very quickly..

b. With BWA7.3 – should we look at cubes at all ?
Since DSOs can be indexed , the EDW becomes a vast store of DSOs and Infosets which could be used to report on data – we would then save on the space being used for cubes , aggregates , indices etc…

This would then mean that we would be using a transparent table structure which can be used for all reporting purposes.

This brings us back to the same question… if Cube design still relevant ..?

c. HANA

With HANA bringing in real time analysis of transaction data and with HANA being planned for BW as well… and maybe if there was also access to HANA using Remote infoproviders ( at a possible later stage )… then do we need to maintain any data at all in BW..?

This brings into question if the principles of Imnon and Kimball with regard to data modeling and dimensional data modeling and if they are relevant in today’s world of accelerated reporting using hardware in-memory computations..

Is this the new data warehousing that we need to get used to and get familiar with..?

 

It is not only SAP BW that has in-memory acclerator capabilities , we have Exalogic , NetApps , Vertica etc to name a few and the same concepts apply to all the above which start to question concepts of ROLAP , MOLAP , HOLAP etc.

Do you think that these time honored concepts are still valid and would continue to remain so going forward ?

Assigned Tags

      12 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Kenneth Murray
      Kenneth Murray
      This is a very exciting topic.  Thanks for bringing it up.  It's really revolutionary.  I'm afraid that the technology will be kept too expensive though.  They should have HANA without the appliance to give the rest of us a chance of having a great ride.
      Author's profile photo Former Member
      Former Member
      I have raised the same question in many forums but have not recieved a definate answer.  If a cube is designed for flexibility and improve access then the improved performance from Hana can create the same structure dynamically.

      Regards

      Author's profile photo Witalij Rudnicki
      Witalij Rudnicki
      In my humble opinion the answer to the question should be based at the assumption if we want to have sci-fi discussion (nothing wrong with that) or fact-based discussion. The problem with fact-based discussion is that today there is still not too much known about behaviour of BW 7.3 and HANA in the reality. But because I always have tendency to discuss based on facts, let me use at least what is known to add my voice to the conversation.
      First of all - the question is absolutely right, and unfortunately I do not know the answer. Let me first add to Arun's points:
      a. Cube design IS critical in my opinion and this is why. Indeed if you put the cube with 100s of thousands or even few millions of fact table records (very typical initial scenario for BWA implementation) into BWA, it might seem to be irrelevant if the dimension is 10% or 50% of the fact table. But what if we grow to trillions of records? Scenario that may be is not available today, but should now be possible with in-memory technology? Would bad design still have little impact comparing to good design? I think NO. In great simplification there are 2 major parts to JOIN execution: read the data from tables and then join the data. Yes, in-memory greatly reduces the time for the first step because of disk I/O bottle neck elimination. But now to execute the join in MPP environment you need either redistribute complete set of dimension data to all nodes where join with fact table is executed or to rehash dimension table to the nodes based on fact table hash key. It might be unnoticeable if it takes 10ms for well designed schema vs 50ms for badly designed on small amounts of data, but if we move to enormous amounts 10s vs 50s will become obvious.
      b. Ideally it should be answered by Thomas Zurek or someone from his team. But as far as I know because BWA is still based on the start schema concept and not 3NF joins, DSO is converted by BW 7.3 into InfoCube during load into BWA 7.2. Might sound like a good news, because now it is software responsibility to build correct schema (Paul, can be connected to your statement).
      c. While there are many applications already using in-memory approach, I haven't heard any of listed - Exalogic (I believe Arun meant Exadata), NetApps, Vertica - using this. Will appreciate any links to learn more. There is one more confusion around "in-memory" in ICE. Recalling TechEd sessions SAP was mentioning Jim Gray's "RAM Locality id King". The meaning of that is that the bottleneck today is in a TIME TO READ DATA FROM RAM!
      Besides my attempt to clarify some names (http://bit.ly/eUkzrO) I see there is still confusion, like in Ken's statement "HANA without the appliance". It is like to say "MS Windows without operating system", because HANA *is* the name of the appliance. I believe Ken meant to have downloadable SAP In-memory computing engine (ICE) without the need to buy it with the hardware. And here is a caveat: ICE is built to run specifically on Intel Nehalem-EX CPU architecture. So, unless there are some other news that I am not aware of, HANA cannot be broken into separate pieces.
      Regards. -Vitaliy
      Author's profile photo Former Member
      Former Member
      Hi Vitality,

      >>>But what if we grow to trillions of records? Scenario that may be is not available today

      We are in the scenario already and now doing some remodelling , dropping loading data , archieving. It should better be done in the first time itself the right way.
      You message is very clear and I agree.

      Sanjay

      Author's profile photo Former Member
      Former Member
      Hi All,

      Now the DWH concept is radically changing now a days in the concepts and implementation, as required by the growing industry trends and innovations,
      Might be in near future the old concepts of EDW & starschemas may not be a valid points for argument..

      Due to in-memory concepts and cloud computing features the traditional SAP BW consultant role in future endangered in industry...

      Regards,
      Purnend

      Author's profile photo Former Member
      Former Member
      This is what most BW analysts are saying wrt HANA; "no more InfoCubes".. I think data warehousing is not gonna die because it founded main concepts like cleansing, consolidation, harmonization, and data access security that don't change with technology advancements. Maybe the star/snowflake schemas won't be necessary since they originated to speed up data retrieval but not data updates which means that HANA won't employ cubes and at the same time run transactions (OLTP).
      One large table is less expensive to update and join than a cube given hardware/columns/MPP/compression ...etc. I might sound more sci-fi than realistic but I tend to think that HANA will erase the line between OLTP & OLAP. I guess the real struggle will be in migrating traditional applicatins (thousands of rdbms optimized OLTP & OLAP apps) to HANA. This concern must be of higher priority to SAP than building new preconfigured HANA apps.

      We're all excited about this era of in-memory computing and we might see in-cache computing in our time!

      Regards... Ali.

      Author's profile photo Arun Varadarajan
      Arun Varadarajan
      Blog Post Author
      Datawarehousing is not dead ... it is just that previous tenets are being questioned and being examined for thir validity in a world largely based on hardware acceleration as opposed to intuitive design as it was based earlier on....
      Author's profile photo Former Member
      Former Member
      Hi Arun,

      >>>Since the BWA is going to index the columns and compress the data , if we had very poorly designed cubes with big fat dimension tables , the BWA would be able to handle the same since it sees each column differently.

      The performance may not be noticeable but you will be using more memory on the blades and blades software is expensive. You will also be using more cpu to process those huge joins .

      Sanjay

      Author's profile photo Arun Varadarajan
      Arun Varadarajan
      Blog Post Author
      I do understand that bad modeling is bad .. but consider this scenario..

      Lets say I have a cube that has about 10 million records right now and is okay with performance. When the cube was initially designed , it was designed for about 1-2 million records and most likely due to changing requirements , it is poised to grow to a trillion record cube in about 2 years... Now there is a little bit of hesitation in touching such a big cube to remodel it or change the design for performance .. and even if we want to , it becomes a one month or two initiative which involves testing , backup , data loading and also ensuring enough DB Space which involves team members from BW , DBA and BASIS , if you look at the resource cost and timelines and also since the cube is so big , we might have reporting disruptions for about 2-3 days perhaps..? or add 1 week for avoiding that by way of a copy cube!!!
      Instead of doing all this , it might be easier and in some cases cost effective to add another HANA / BWA box and get it over with... and avoid most of the steps above ...
      My 0.02..
      Arun

      Author's profile photo Klaus Nagel
      Klaus Nagel
      Hi Arun and others,

      please let me share my comments and thoughts on the modeling aspects of BW and how they change with the increasing adoption of In-Memory technology in BW. Let’s do this in chronologic order of the BW and BWA releases:

      BW7.0x with BWA7.00/7.20: Having a “bad” InfoCube design, i.e. dimension tables comparable large to the fact tables, is still bad – but not always helpless with BWA. If the data is replicated from the database to BWA in a 1:1 fashion, BWA has to perform the JOIN between the large dimension table the large fact table at query runtime. In general large fact tables are splitted and distributed across the blades, so in order to perform the JOIN the dimension table has to be send to all blades that contain facts (it gets even worse, if the dimension table is split as well). In such a scenario the communication costs dominate the query response time. BW/BWA has 2 ways to deal with such a scenario:
      •     BWA recognizes such a scenario at the time of the creation of the BWA index of the InfoCube – and decides to split the large dimension table exactly the same way as the fact table. This way the JOIN can be executed locally on each blade.
      •     In case there is more than one large dimension table (a very “bad” design) the above does not work. So BW recognizes this and decides to index the InfoCube data in BWA in a “flat” format. The data is read from the database in a broad format dissolving all dimensions (it looks like an InfoCube with only LineItems) – this way the JOINs are already materialized and do not influence performance at query runtime. The memory consumption of such a “flat” InfoCube in BWA does not differ significantly from the normalized version- thanks to the columnar storage and special BWA-dictionary compression. But: reading the data from the database with all the JOINs may put a heavy load on the database and may take some time.

      BW7.30 with BWA7.20: Basically it is the same as with BW7.0 for all of your existing InfoCubes, but for new InfoCubes you have now the option to choose the property “BWA-only”. In this case the InfoCube data is no longer persisted on the database, but is written directly into BWA. In this case, you can model the dimensions of the InfoCube as you like: either just one or group the characteristics semantically – you do not have to take any performance aspects into account! The reason for this is, that the physical layout for the InfoCube indexes (BWA “tables”) is always “flat”, i.e. the inner part of the InfoCube snowflake is dissolved as described above. Despite the new flexibility in defining dimensions, the “BWA-only” InfoCubes have the additional benefit of improved data load times, since no DIMIDs (foreign keys for fact and dimension tables) have to be created – this performance improvement can be significant in many cases.

      BW7.30 based on HANA as database (this option is planned to have RTC in October 2011): If HANA is the database of your BW you can leave everything, every object as they are today, and they function as today (just faster:-)), but you also have the option to switch to special “In-Memory“ types. For InfoCubes for example you can migrate your classic ones to “in-memory optimized InfoCubes” which basically look like a “flat” BWA-index, i.e. they only have one fact table with only LineItems – but still you can create dimensions in the InfoCube maintenance to group characteristics semantically (again w/o taking performance considerations into account). As for the “BWA-only” InfoCube you benefit additionally from the improved loading times, since no DIMIDs have to be created.
      Now, to answer the obvious question upfront, what’s the difference to DSOs then? In the BW-HANA case you can think of an InfoCube as a SDI-based DSO representing the “reporting layer” in your Enterprise Data Warehouse. In the case of a 1:1 relation between the DSO and the InfoCube you may even consider to switch to the DSO reporting directly (it is “in-memory” as all your data), but there may be cases (complex queries) where the key-based layout of the DSO is slower than the SID-based layout of the InfoCube.

      I hope this answers your questions and sheds some light on the discussion how BW evolves with the in-memory technology – and highlights the fact that BW (and the data warehouse case) is not “dead” with in-memory technology, but in contrast it can now focus even better on its prime goal: enable insight into consolidated, high quality data.

      Regards, Klaus

      Author's profile photo Arun Varadarajan
      Arun Varadarajan
      Blog Post Author
      Klaus,
      Thanks a lot for the insight into how BWA works ... but then I have another concern as well - this is not related to technology but more towards the way projects are done... In this case lets say I have a cube with about 10 million rows. The cube was initially designed for 1 million rows and is performing well with aggregates. Now due to increasing scope the cube starts fielding about 100 million rows and the performance starts to drop. In response to this various options are evaluated including redesigning the cube / reloading the data etc and while comparing the time and effort required to do this , it is decided to implement BWA since it has a larger impact. In this case is In-Memory/BWA an alternative to good design..? and as the cubes get bigger more blades can be added...
      Of course you can still perform activities like partitioning, archiving etc to keep the cube size down but the BWA gives you the option of keeping the entire data active which is a major bonus...
      Author's profile photo Klaus Nagel
      Klaus Nagel
      Arun,
      BWA can be a "work-around" for some or even many "bad modelling" issues or may make the notion of a "bad" model even obsolete. But depending on your BW-release there are still some things to take into account - see my comments above.