SAP’s HANA: Removing the Shackles from Terrible Data Design
I’ve been talking a lot at work lately about Agile BI. The solution I’ve been trying to “sell” internally is to use Sybase Replication Server to make real-time copies of our transactional systems into a Sybase IQ-powered operational data store (with minimal data cleanup) to serve as operational data stores (for more of my thoughts, go Sybase Love Fest Part 1 – Data Solutions). I’d build the lion’s share of the semantic layer right on top of this copy and only use traditional ETL for the most complex and least performant use cases. Basically, this makes BI “agile” because you don’t have to wait on time-intensive ETL to be developed and you can basically whip up a smoking-fast reporting copy of any database in a few days. We’ll call that scenario HANA-light, and I believe that is the near-term solution for most non-ginormous enterprises.
The Pro’s of HANA-light are that we wouldn’t need to report directly off of the transactional database (since we have an up-to-the-minute copy) so we won’t impact performance on it, the reporting should be way faster (since it is stored in IQ), and we won’t need very much ETL (which is a necessary but slowly-developed evil). The only real Con I can think of (besides additional cost) is that the semantic layer will be really tough to build on top of the transactional schema than on a more traditional data warehouse type schema of almost any type.
Why are transactional data models really hard to report off of you say? Because application developers build (currently out of necessity) really crappy data models. Data is stored a hundred times in a hundred ways so that the user experience is fast but the master data management is a painful process. Data is stored in painfully normalized (or denormalized – typically whichever makes the least sense from an analytics perspective) which makes ugly, inefficient, and often not-reliably-correct multipass queries necessary when trying to actually analyze the data. Finally, data is stored in [please feel free to insert your own “trying to report off of a transactional database” horror story in here]. The bottom line is typically that app guys/girls need to design their database in such a way that reporting folks throw up in their mouths a little bit when they see the schema because no matter what the application needs to respond FAST.
(And why do we in business intelligence allow application developers to get away with this? Well, quite frankly we are red-headed stepchildren, to use the parlance of our time. The common convention is that getting the data in is the most important thing, and the data monkeys will always figure out some way to get it out. With existing databases, that is a very valid point. Also, app developers are sort of like the bass players of the IT world: they always look cool and mysterious and get all of the groupies. Sorry, I think I’m spiraling here.)
Fortunately, because HANA can query just about anything super-fast, and because HANA — being just a database — will soon be used to build applications and not just data marts, application developers no longer forced to create stupid data models. If the BI data modelers are brought in earlier in development, not only would agile BI be a given (even less ETL would be required, building the semantic layer would be a snap, and BI would already know the layout), but think about how clean the master data management would be. Each entity or lookup value would have one record (although admittedly slowly-changing dimensions or some other method would need to allow for them to change over time) and it could all be managed from one location and we would KNOW that there would be no confusion!
(Will this make apps harder to build? Maybe a little, but the best application developers will adjust very quickly. See how I just set up application developers to sound like they aren’t very good if they disagree with me? Nice, huh?)
(Also, will this make the data in the better data model more right? Of course not, but it should make it significantly easier to get right in the first place, and to correct when it gets out of sync. If I’ve got to fix bad data, I’d rather do it once and not even need to have it propogate throughout the system.)
I haven’t always been an enormous HANA fan (especially when it is talked about at the SAPPHIRE – It is what we thought it was) but I do think it actually gives us a unique opportunity to not only change the way we design an application ecosystem, but also to step back and design it correctly. Now that could, if even partially-realized, be a game-changer. So thank you HANA, for totally removing the biggest barrier-to-entry for having a truly elegant transactional data model.
And I’m sure application developers would love to have the opportunity to make their applications elegant both on the screen, in the code, AND in the database.
you know, I'd love to take advantage of using a columnar DB in the ABAP code I write, but quite honestly I have no idea how to do that. Have you seen any resources for those poor devs that need to live up to your wonderful standards of adopting this new framework? With HANA having hit a few InnoJams, there must be something out there by now? Any hints 🙂
Cheers,
Chris
PS love your blog style.
http://www.zdnet.com/blog/howlett/hana-is-hereinnit/3221
Again as I understand it, you replciate data from other source systems(R/3, BW etc) and then run reports off of HANA 1.0 in "Real, real time". Whether this("real, real time") would create chaos in business world or would add value is something I don't know. or probably it depends.
I believe you are correct, HANA 1.0 is mostly about pulling data OUT. However, the data does have to get in there somehow. Going forward the plan is for HANA to be THE application development platform for HANA, so writing to it will have to be supported.
whether I use (I hope this will come at some point) ABAP or direct SQL or more stored procedure like SQL code in HANA itself, you're right, developers need to rethink about how they code and how they build data models so that they work better with a columnar DBMS.
From my research (read big G search engine) It _does_ matter how you code - throw a few "select *" statements in there and you're be better off using an indexed table based DBMS, a columnar DBMS will actually take longer to retrieve the data.
Problem is, I certainly can't find any SAP resources on what makes good coding, or good data structures/design for HANA. Even looking more generally on how best to code SQL for columnar DBMS's there isn't that much.
So don't go dumping on us poor dev's ya hear! 😉 If someone could please enlighten us to what it means to do good HANA compatible design, I'll start churning it out. (more the the point I'm really interested to learn) Or is it as Hasso mentioned in his keynote at SAPPHIRENOW - still a learning process that we're all going to be dragged into and at the moment we just "don't know".
Cheers,
Chris
I'm in the process of engaging with the development team now to see if I can get a demo to you all. Stay tuned.
Nice work on a snappy, thought provoking blog. Of course, we're all in same boat (AppDev and BI that is), we all want a beautifully designed, elegant and simple data model that works well for all conceivable use cases. Of course we rarely get that, because the real world gets in the way, and ultimately we all have to deliver something that works, even if it doesn't look very pretty.
In my experience, increases in performance are always matched by equal (or greater) increases in expectations from users. So regardless of the speed of the platform, there will continue to be a requirement to optimise the data model for specific use cases. And no doubt these optimisations will still be considered "terrible" by some.
The shackles might be coming off, but they've been replaced by an ankle bracelet. Better definitely, but we're still not completely free.
Cheers,
Jon
Regards,
Andy
SAP Geek's Blog
If I’ve got your argument right you are hoping that HANA will do away with these problems and leave the BI guys/gals with a “better” base from which to work their magic.
Now I’m no expert in HANA or indeed what SAP’s plans are in this area, but what I do know, because we have technology in this area, is that the SAP “data model” is something to behold. Truly it is a work of art and of a scale and complexity that puts into a hall of fame that only it can inhabit.
We spend our lives helping BI developers interpret this OLTP optimised data model (I’ve blogged on this here http://silwoodtechnology.wordpress.com/)
I guess the reality is that SAP are not going to change this data model in any short order for HANA, but if anybody does know more about this I’d be interested to know.
I doubt SAP will be changing their core data model anytime soon, but each new application they build provides an opportunity to not only optimize performance but also the elegance of the system. A couple of very small changes that HANA could enable could save their customers literally millions of dollars to workaround and SAP is definitely clever enough to capture some of that value.