Why BW and HANA are such a good combination
There has been much talk and debate on whether HANA replaces BW. See Steven Lucas’s blog, the associated comments, discussions on Linkedin or Twitter (e.g. start somewhere around here) or some blogs (e.g. this one) and probably many other places. The sheer number and heat of those threads indicates, in my opinion, that the question is so relevant that it underlines how relevant BW is.
Figure 1: An EDW comprises a DB that is complemented with a application code (X).
Now, a lot has already been said about the matter. The origin of the controversy has a little bit to do with the perception that an enterprise data warehouse (EDW) basically is an DBMS. Actually, in a discussion with some well known analysts some time ago I popped the question whether they would see a difference and met a few seconds of silence. In my opinion, a DBMS is a fundamental component of an EDW, like an engine to a car. But the DBMS alone makes no EDW like an engine alone makes no car. The car’s chassis, body, interior, … provides the comfort for the consumer and makes sure that the engine is used in a proper way. Similarly, there is software surrounding a DBMS that makes it the heart of an EDW. BW is simply one such instance of software that can do that. Figure 1 shows a slide that I frequently use to create that awareness.
Why is BW such a good choice – in my opinion – to make HANA the heart of an EDW? Rather than answering this in a generic, philosophical and potentially non-tangible way I will give three instances that clearly underline that statement:
- BW has the most capable code generator (for queries) to leverage HANA’s calculation engine in the most efficient way. Even experts will struggle to emulate this by handcrafted code. The trick is to issue a graph of related calculation instructions whose dependencies are in the calculation semantics (of an OLAP query) and exposed to the calculation engine which, in turn, can leverage that knowledge to optimize the processing. “Hello World” style or simple, SQL-style queries won’t see a notable difference but queries with calculations beyond a single row (YTD, hierchies, currentmember, normalizations to group totals, group-specific aggregations, …) will and these are neither esoteric nor rare but mainstream.
- Changes in BW are creating the right context that allow HANA to process load operations in a way that translate into overall accelerations of factor 3 (on average) for process chains.Instances are translating single row to mass data operations or the HANA optimized versions of infocubes and DSOs. In fact, I’ve seen customer measurements that indicate that many process chains only yield equal performance (compared to classic DBMSs) but improve significantly by using the HANA optimized infocubes and DSOs.
- In autumn, both, BW and HANA, are planned to ship a mechanism to distinguish between hot (= permanently used / queried), warm (= sporadically, e.g. in nightly batch processes) and cool (= rarely used, e.g. old requests in PSA) data. This is will translate into a significantly better usage of HANA’s memory, similarly as if the data compression algorithms were to yield higher compression rates. As the roles and semantics of tables in BW are well known they can be easily classified by a default configuration which means that a BW-on-HANA system will use HANA by default more effciently than an operational data mart where such settings can also be done but need to be set and derived manually.
This list can certainly be extended. It is meant to provide a non-marketing, down-to-earth flavour of what you gain with BW beyond the usual services. It is not surprising that some customers have lauded it the most genuine application on and for HANA that SAP provides.
The original post of this blog can be found here.
I'm still struggling to undersand why SAP-HANA wouldn't replace BW. Here is why:
As we all know the term Business/Data Warehouse came from traditional Warehouses used to store manufactured goods of a factory. And large trucks are used to move them to the consumers' locations. The trucks' speed is limited. So factories try to develop methods/algorithms/procedures to reduce the moving cost and at the same time trying to reduce the time lag between the time of manufacture and when it would become available to the consumer. One way they do this by storing in the warehouses temporarily and performing mass-shipping.
Now let us say some warehouse owner is working on developing a new truck engine which could move goods very fast meaning the finished goods could be made available to the consumers immediately after they're manufactured regardless of the distance(something like Just-In-Time). In this case, I would presume the new engine would replace the warehouses, isn't it? This transition would't take place overnight;however at some point, this would happen. If this is not the case, then I would expect the warehouse owner will have problem explaining as to why the new engine wouldn't replace the warehouses.
I guess SAP-HANA Versus BW discussion is similar in nature. Here is my analogy at actor level:
IMO, new truck engine's application would be simpler than Warehouse application because we don't need to combine several consumers' demands in one truck. Similarly SAP-HANA application doesn't need to worry about several layers as Vishal used to say - PSA, DSO, Cube, Data Loading, Cleansing, Filtering, Aggregates etc.
In a nutshell, X in your equation EDW = DB + X would/should become simpler; SAP would like to continue calling X as BW even with SAP-HANA as DB; however it should be a lot simpler than BW so IMO warrants a new name.
Does this explain why HANA replaces BW discussion is not dying?
I think this comment does explain why the myth persists, partially by exemplifying some of the issues that we are disagreeing about 🙂 I like your analogy, so let me address it in terms of your analogy:
It's very interesting that some of the most advanced supply chains in the world (Walmart, for example) rely heavily on routing a significantly higher % of goods through distribution centers (aka warehouses) compared to competitors. Walmart also supposedly has one of the largest datawarehouses in the world. Food for thought 🙂
At some point the analogy breaks down and we have to start talking about the actual issues though 🙂 Truthfully, if a company is just using BW for basic siloed reporting scenarios and not as a datawarehouse, then I think it doesn't offer much value on top of HANA, except perhaps being a more mature application framework. You can even manually do aggregates and such in HANA if you need to. But if you want an actual datawarehouse management toolkit with all that implies, then BW adds a ton of value on top of HANA.
P.S. Not really a direct response to what you wrote, but it's worth saying: "Datawarehouse" don't imply "aggregates" - see the whole discussion of logical datawarehousing discussion that Gartner has been pushing, and which I think BW has arguably pioneered as a semantic modeling tool for datawarehousing.
I guess I was oversold on the capabilities of SAP-HANA or RDF of Vishal? At any rate, by new truck engine, I implied new generation, game changing engine capable of delivering anything, everything in a fraction of seconds to anywhere - this would be similar to how SAP-HANA was introduced. In one of Hasso's lectures, I remember him saying that there would be no aggregation anywhere. I reviewed the collection of videos by Hasso; couldn't find the one in which he says everything will be calculated on the fly(will let you know when I find that). All rules have an exception so don't know if you're talking about those exceptions when you said:
or you believe there would be more exceptions?
Based on what I read in other DB vendors' sites and here, SAP-HANA looks more like next version of more-expensive BWA than next generation DB. To put it another way, I see a big gap between promise and delivery.
I certainly believe there is a gap between perception and reality, if not between promise and delivery. 🙂 Until SAP releases benchmarks and more people get hands-on, we won't really know. At best, what will happen is that current processes that are 100x too slow will become manageable, while current processes that are 1000x or more too slow will remain out of reach. There area ton of those 100x processes, so HANA can provide a lot of value, but there are also a ton of those 1000x processes.
The reality is, I think, somewhere between what SAP has been saying and what the other database vendors have been saying. When we look at database technology as a whole, HANA is simply not as revolutionary as SAP would lead us to believe (just look at IQ). But HANA is important because it integrates existing technological approaches into a package in a way that's only been accomplished by a few other players in very small volumes. Combined with SAP's scale and ability to bring this approach to a mass market, I think HANA can make a huge difference in company's ability to manage their data.
http://www.youtube.com/watch?v=A0AadKKUV7o discusses one system for both OLTP/OLAP systems. This looks real and revolutionary. 🙂
one perspective to look at your analogy might be to differentiate the Operational Data Store from the Data Warehouse.
Using your example, there is an excellent change that the new engine (HANA DB) used in ECC (ECC on HANA) and in HANA applications will make any redundant data transfer and data storage in an Operational Data Store superfluous. 100% agreement that HANA can replace Operational Data Stores.
The vision for the HANA DB is to support both, OLTP and OLAP in a single data base, which allows real-time reporting in the ECC transactional system.
Considering, that BW is today used in both scenarios, Operational Data Store and Data Warehousing, the purpose of BW as a Data Warehouse will remain, besides that BW is the foundation for the planning and consolidation applications BPC, IP and BCS.
Typical Data Warehouse tasks could be for example:
· Combined analytics from multiple source system, if a company has
To maintain history, if the source systems do archive data
· To combine real-time data with the data from any of the scenarios above
The great opportunity with BW on HANA is, that on a single DB (HANA) it is possible now to run a consolidated Data Warehouse of
· Architected Data Marts based on BW tools
· Operational Data Marts using HANA data modeling tools with or without BW Business Suite meta data replication
· Agile Data Marts using both BW and HANA data modeling tools sharing SAP Business Suite metadata, master data and transaction data
May be it is fair to say, that the new HANA DB provides a new engine
When someone discusses/mentions BW, mainly two things come to my mind:
My perception for two+ years has been that SAP-HANA would get rid of both aggregation and the layers. If the purpose of BW as a data warehouse will remain in SAP-HANA environment, then I guess SAP-HANA is not going to get rid of (1) and (2).
Thanks for your input,
Thank you Bala,
for your interest and your feedback.
With regards to aggregates:
The need for aggregates has been already eliminated with the use of BWA (BW Accelerator), which is based on the same In-Memory Technology, which HANA is using.
That means for a BW implementations, which already use the BWA, there is no change w regards to aggregates after the migration from BW on a traditional RDBMS to the HANA data base. These customers do switch off their BWA’s after the migration.
For BW implementations without BWA, the active aggregates are no longer necessary, as the data of the InfoCubes are uploaded into the Memory of the HANA data base. These customers can de-activate all their aggregates and stop the loads and delete any aggregates.
Customers with BW on HANA have experienced a compression rate of at least 4-5 for their data after the migration to a HANA DB. Customers presenting at SAPPHIRE 2012 have reported factors of 6 (California Edison), 7 (Lockheed Martin) and 9 (USHA international, Japan).
With regards to layers:
With and without BWA even more, the individual customer design of the BW data model can incorporated several layers of InfoProviders for aggregation purposes. For example one layer could be a DSO with Line Item granularity, another layer a DSO with Document granularity and another layer consisting of an InfoCube with aggregated Document and Line item information using dimensions like Customer, Material etc..
With BW on HANA, all the aggregated layers can be eliminated, if their whole purpose was data aggregation for performance purposes. Therefore if the Line Item DSO in the example above contains all the required Business logic, then than this DSO can be loaded into In-Memory and used for reporting.
This is the rationale why some resources say, that InfoCubes are no longer required w BW on HANA. InfoCubes are still mandatory to support Planning applications, like BPC (Business Planning and Consolidation) or IP (Integrated Panning), and may still be useful for specific requirements.
BW on HANA will still require a LSA (Layered Scalable Architecture), but a streamlined version. A BW on HANA migration provides definitely a potential for simplifications on the customer specific BW data model.
In the HANA data mart residing site by site with BW on the BW on HANA system, the basic concept is, that all data models on top of the data foundation are views only, that means no physical data aggregation or data layers. That also includes the data consumption of the HANA data mart views in BW Composite Providers
In BW on HANA and in the HANA data mart there is absolutely no need for aggregates.
BW will have some streamlined layers of persistent InfoProviders based on individual customer design for specific Business requirements, of which those containing hot data can be allocated to In-Memory, and cold data to NLS (Near-Line storage).
Have you read this blog? http://scn.sap.com/community/data-warehousing/blog/2011/01/28/the-latest-hot-tech-vs-fundamental-datawarehousing-tradeoffs--remember-bw
It provides a whole list of challenges that datawarehousing technologies attempt to deal with. It's not a complete list, but it's a good start 🙂 Out of the 12 items on the list, BW helps deal with 11 of them with various levels of success. The two you list correspond roughly to subsets of the items "data volume" (which is I think what you and Erich mean by "Layers", though dealing with volumes of granular data is certainly not the only reason for layering concepts like LSA) and "performance" (in the case of aggregates). Your description, while a very common conception, is a small slice of the problem space that BW addresses.
My point is that the idea most people seem to have about BW - that it is just about extracting and aggregating data from SAP ERP for performance reasons - is not a realistic description of what BW is capable of when used well. Indeed, some companies use BW this way, and in my opinion they are not getting a great deal more value out of it beyond what HANA can now provide with DXC connections to business content extractors. But companies using BW with datawarehouse concepts are often using it in a much more sophisticated way.
It seems like SAP needs to do some marketing of BW as a general purpose datawarehouse toolkit 🙂 However, marketing BW has always been a weakness at SAP 🙁
yes, I was differentiating between layers, which we can get rid of, referring to Bala's question, and layers of the LSA Layered Scalable Architecture, which can be streamlined.
As you point out, the LSA layers are not unique to BW, they are general EDW layers like for example for Data Harmonization from different sources, EDW transformations and Business transformations.
The premise of HANA as a single data base for BW and the HANA data mart is to complement each other to streamline the layers in the future more and more by pushing most transformations into In-Memory to avoid persistent layers where possible to increase agility.
The challenge from a DW modeling perspective is to find the balance between the requirements for persistent layers for GRC requirements, auditing and archiving of constant, predictive, validated and reconciled query results on one hand (single source of truth), and transformations on the fly for agility on the other hand.
That is the beauty for me with BW on HANA, that we can run EDW functions and agile Data Mart functions on a single data base, HANA. Single data base means for me elimination of data latency and minimization of data redundancy while maximizing agility.
If you have further questions, please do not hesitate. This is a very healthy and important discussion.
Please watch this interview with Vishal and let me know your thoughts:
Vishal eloquently and passionately talks about running analytics on raw, operational data from a single base with no aggregation. He also mentions the customers are turning off their data marts and data warehouses (Timeline: 5m).
what Vishal describes in the video was already shown during Hasso's keynote at SAPPHIRE 2010, when he was demoing exactly these scenarios.
When Vishal talks about reporting from raw, operational data from a single base with no aggregation he is referring to HANA's capabilities to run OLTP and OLAP in a single data base. SAP Business ONE is currently in ramp-up, and ECC on HANA will be forthcoming.
As I said further above in this blog, there is an excellent change that the new engine (HANA DB) used in ECC (ECC on HANA) and in HANA applications will make any redundant data transfer and data storage in an Operational Data Store superfluous
With regards to Data Warehousing, I believe there is still a need for data harmonization when running ERP systems from multiple vendors and to keep persistent data layers for GRC purposes.
In this blog you can find more information w regards to Data Warehousing needs and HANA:
Next generation, revolutionary technology would make that list irrelevant and obsolete; However ,due to what you said y'day, they would be relevant for sometime:
Vishal in his blog:
True, real-time performance, in my opinion, would be possible only if we run reports on data as it changes. This would require elimination of layers such as source systems, delta mechanism, PSA, DSO, cubes, aggregates, BWA etc. This would make LSA less relevant . This is my understanding based on TechEd keynote and subsequent conversation on social media.
In TechEd 2011 keynote, Vishal says something like this: (timeline 25m) http://www.sapvirtualevents.com/teched/sessiondetails.aspx?sId=38
With LSA, would Vishal be able to provide realtime information on suppliers for that CEO?
This is a really interesting discussion.
As I say in my blog, I don't think most of these problems are technological problems, so it is difficult for a pure technology to fix them. Performance tends to be more susceptible to technological solutions than the other problems I list, but performance is also unique because technology is undermining its own advances - it is getting faster, but it also generating vastly more data. I don't think these problems will become obsolete unless we can find really new ways of thinking about data. BW doesn't make any of these problems go away, but it explicitly provides a toolkit to help us think through the problems.
I think "real-time" (a term I hate, but I know what you mean 🙂 ) is an interesting challenge for most existing datawarehouse architectures, but people have been working over the last 10-15 years to introduce fast updates into datawarehouse architectures. BW has supported this capability for some time using RDA (Real-time Data Acquisition) for loading data up to once per minute from either SAP or a web service datasource.
These RDA DSO structures can be included in an LSA concept pretty easily. LSA doesn't require that data actually be persisted at every (or any) layer in the architecture, so using BW on HANA one could certainly persist data at the harmonization or operational-data-store layer with up-to-the-minute accuracy using RDA. Personally, I think this solution has been underused as a way to address operational reporting requirements in a way that maintains a semantics consistent with the datawarehouse.
I hope that there are some new concepts in the works as well for leveraging more source-agnostic technologies like Sybase Replication Server and stream-processing frameworks.
Let me explain why revolutionary technology such as SAP-HANA would make that list irrelevant.
(After watching Vishal's video and Hasso's lectures, I'm convinced what SAP promises is real:). I also know that technology needs to mature, it may take a few years; but good news is that they've cracked it. BW versus SAP-HANA discussion is more political than technical. Technically I as a customer wouldn't invest in BW regardless of what others say. Sorry.).
RDA or Active Datawarehouses (Remember this one from NCR's teradata:((() are just placeholders for true, real time systems. We unfortunately don't have skilled manpower to implement that near real-time technologies. And the customers ask: What do I get?
The difference between RDA or Active Datawarehouse and SAP-HANA is: One is near real time and the other is true real time. Let us get real!
I'm more optimistic about SAP-HANA today than y'day.
If something is not clear, I would be more than happy to discuss.
I can't say I agree, but at this point we've probably discussed it to death in this comment thread and I'm having a hard time following the argument in this bullet-point format. Would you be willing to blog on the topic?
In the meantime, I'll try to get my more in-depth explanations of these challenges re-posted to my blog. It seems that it got lost in a migration and so the links in the blog I sent you are now broken.
I'm currently listening to your DSLayered podcast and I'm encouraged by Steve Lucas's statement that he encourages the discussions around HANA versus BW. That's a good thing. In order to get the discussions going in the right direction - if everyone agrees, then there is no fun, right? -, I guess I need to explain my thoughts very well. I believe It would benefit everyone including me. So I'll write a blog soon.
I'm looking forward to reading your in-depth explanations. I kinda assumed what you meant by those 12 items.
I'm enjoying the conversations on HANA versus BW.
Thanks and best regards,
BW on HANA vs HANA + DS.
BW has a many legacy restrictions.
But HANA + DS - it's millions EUR solution.
If all someone sees when they look at BW is legacy restrictions, then you're exactly right as far as that organization is concerned. If someone looks at BW and sees lots of restrictions that are helpful for developing a datawarehouse (and recognizes that one would have to architecturally impose many of these restrictions without BW using custom-designed tooling, a different DW toolkit, or an extremely disciplined review process), then I think that organization might draw a different conclusion.
More rightly is
> If someone looks at BW and sees lots of restrictions that are helpful for developing a "datawarehouse"
DWH in quotes.
Unfortunately it's true.
BW still doesn't any flexible partitioning management
Not sure what you're getting at here. At 27:30 Prakash starts talking about the point we're discussing. He directly addresses the issue at 31:00. I think the discussion here is an excellent overview of why you need a management framework (e.g. BW) to effectively address datawarehousing challenges.
Thanks for pointing me to this presentation. It's an excellent resource. Kind of sad I didn't go see this session at TechEd.
At 30.44 they saying - that was EIM portfolio focus is on.
> you need a management framework (e.g. BW)
Yes, but there one-two phrases about BW. It's not all about BW. BW is not core anymore for analytic.
Prakash play cunning. I don't believe that's Prakash work with SAP BW a lot of time. HANA with MDX and all types of views (analytical ,calculation etc) always must be much faster that BW on HANA.
But i agree with you that mostly BW-customers are a ERP-customers also.
They didn't have many data and users, and many datasources.
BW for them very good variant, that's true.
Prakash may be cunning 😉 He's definitely smart enough! I'm pretty sure he's forgotten more about BW than I ever knew 🙂
"Analytics" and "datawarehousing" are not necessarily the same thing. They tend to have very different requirements. I agree with you that BW is not core for analytics, though it can certainly be very useful. But it's definitely seems to me that BW is core to SAP's DW and larger data management strategy. If you look at the slides for items 4, 5, and 6 in the presentation, BW is central to all of them.
Part of the problem may be that the presentation isn't really about DW concepts, so it's a bit difficult to derive SAP's DW strategy from this particular presentation. The sessions on LSA++ are a bit more clear on the topic and I recommend at least taking a look at the presentations. BW 7.3 is introducing a lot of support for logical datawarehousing concepts and allowing a lot more flexibility for agile datamart concepts than existed in the past, so that's pretty exciting.
could not agree more with you when you say "BW is core to SAP's DW and larger data management strategy". There is definitely more to come from SAP with regards to BW.
If you are looking for more details on the SAP DW strategy, this link might be useful, where Prakash describes the different product roadmaps:
Great posting. EDW based on a convergent model which encompass SAP BW and SAP HANA is a very powerful solution that can and should be exploited by SAP customers throughout the world!