Thoughts and questions about the HANA announcement
The HANA announcement conference call that followed Vishal Sikka’s keynote at SAP TechEd Bangalore was fairly interesting, though it didn’t pack a large amount of technical content. The detailed technical content was not really expected, but I’m always hopeful. What was provided was a short overview and directional talk by Vishal, as well as a talk by a reference customer that was very positive on the solution. The presentations were relatively short and the demo was skipped in order to provide plenty of time for Q&A. I came away with a lot of thoughts and questions, and I want to outline some of the more prominent ones.
Messaging and marketing vs. the technical reality
My The specified item was not found. on this topic was motivated primarily by my perception of a gap between SAP’s marketing around their “in-memory” solutions and the actual technical solutions that are provided. Why are these solutions (BIA, BWA, and now HANA) are so much faster than traditional reporting solutions? The answer is only partly that these solutions run in RAM, or memory. I was pleasantly surprised to hear a lot more disciplined message around the fact that columnar data structures and inverted indices, compression, and massively parallel processing of queries are all enabling technologies along with RAM-based storage and cache-aware algorithms.
I think being clear about the underlying technologies is going to pay dividends as customer technologists are better able to strategize about good scenarios for deploying SAP’s HANA product. This should result in more satisfied customers and fewer situations where the benefit is not as great as expected.
ETL, and Vishal Sikka putting the smack-down on yours truly
I (and hopefully a few of my Twitter followers) had a moment of surprised laughter as one of my tweets was picked up by the conference call moderator and read aloud to the call as a question. Based on what followed, I’m not sure whether I should be annoyed or relieved that my name was not mentioned, but I’ll take responsibility for it here 🙂 In any case the tweet was:
To which Vishal quickly responded, “That is FALSE.” I’m pretty sure he used all caps. He seems to have a special tone of voice for that! He went on to remind the attendees that the reference customer on the call had 70 source systems feeding into HANA.
Now, I know now that the statement is technically false, but I think it is still important to ask this question. The idea is to get people thinking about the ETL area and the relationship between source systems and HANA tables. My impression is that HANA is a data store at heart – specifically the ICE or IMCE or whatever it is called today. Along with the database engine, HANA delivers several reporting integration services as well as data acquisition services (Sybase Replication Server and BusinessObjects Data Integrator).
I don’t know much about Sybase Replication Server, but my understanding is that it clones a database table and then keeps the clone in sync with the original in almost real-time. This is the tool the reference customer was using to populate the HANA system with data from the 70 ERP countries. What was not made clear on the call is whether you can use Replication Server to write all of your data from your 70 source systems into one big HANA table, or if you write it into 70 tables and then can join it in HANA using SQL if you want to. I’m betting it is the later, but I am basing this on nothing more than a suspicion. If someone who knows can weigh it, that would be great, but otherwise I guess we’ll all find out when the documentation is released.
The point is this: As far as I can tell, HANA does not solve your ETL problems for you. If you replicate 70 source system database tables into HANA that all have different semantics or structures, you are going to end up with 70 different data stores – one datamart per ERP. I do not envy the query developer that is asked to build a query consolidating all of those views. It sounded on the call like the reference customer was mostly concerned with executing queries locally in the source systems, so perhaps they weren’t particularly worried about querying across all of the data. It also sounded like the data from the 70 source systems was pretty well aligned, at least with regards to table structure (a minor miracle in the analytic reporting business), so this customer probably didn’t suffer much ETL-related pain. But if you do have an ETL or semantic integration problem, and I find these are two of the biggest problem areas in BI, I suspect that HANA is not going to help you fix it, though you could certainly use the tools delivered with HANA to build your own solution to the problem.
I would love to see more information on this topic so that we can determine if what is written above is accurate!
Query speed improvement
This reference customer was a very good candidate for an amazing speedup using HANA for one particular reason: they were doing analytical queries on a relational ERP (read “de-normalized”) database. This is, pretty universally, a bad idea. This is not to say that it is rare. A huge number of reports of an analytic nature are run off of ERP databases. The promise that HANA with Replication Server makes it easy to move these reports to an engine optimized for analytics is probably this version of HANA’s biggest selling point. This is the use case the reference customer was there to tell us about.
But as it is, a customer who is already running these types of reports out of a BI or data warehousing tool should not see the same massive 2000x speedup in query execution (5 seconds in HANA vs. 3 hours on the ERP system). The speedup should be big, but it probably won’t be this big. Something more like 100x would be realistic in these situations, with some queries and datasets getting significantly better boosts while others see much less improvement.
So that’s it. Those are my thoughts, or the bigger ones at least. I’m becoming more and more positive on HANA as more information on the product comes out. Hopefully this blog will stimulate conversation, corrections if necessary (all caps, please), and the sharing of more information about this product. We need it in order to make good decisions about this new (dare I say “innovative”?) technology from SAP.