SAP’s HANA: Removing the Shackles from Terrible Data Design
I’ve been talking a lot at work lately about Agile BI. The solution I’ve been trying to “sell” internally is to use Sybase Replication Server to make real-time copies of our transactional systems into a Sybase IQ-powered operational data store (with minimal data cleanup) to serve as operational data stores (for more of my thoughts, go Sybase Love Fest Part 1 – Data Solutions). I’d build the lion’s share of the semantic layer right on top of this copy and only use traditional ETL for the most complex and least performant use cases. Basically, this makes BI “agile” because you don’t have to wait on time-intensive ETL to be developed and you can basically whip up a smoking-fast reporting copy of any database in a few days. We’ll call that scenario HANA-light, and I believe that is the near-term solution for most non-ginormous enterprises.
The Pro’s of HANA-light are that we wouldn’t need to report directly off of the transactional database (since we have an up-to-the-minute copy) so we won’t impact performance on it, the reporting should be way faster (since it is stored in IQ), and we won’t need very much ETL (which is a necessary but slowly-developed evil). The only real Con I can think of (besides additional cost) is that the semantic layer will be really tough to build on top of the transactional schema than on a more traditional data warehouse type schema of almost any type.
Why are transactional data models really hard to report off of you say? Because application developers build (currently out of necessity) really crappy data models. Data is stored a hundred times in a hundred ways so that the user experience is fast but the master data management is a painful process. Data is stored in painfully normalized (or denormalized – typically whichever makes the least sense from an analytics perspective) which makes ugly, inefficient, and often not-reliably-correct multipass queries necessary when trying to actually analyze the data. Finally, data is stored in [please feel free to insert your own “trying to report off of a transactional database” horror story in here]. The bottom line is typically that app guys/girls need to design their database in such a way that reporting folks throw up in their mouths a little bit when they see the schema because no matter what the application needs to respond FAST.
(And why do we in business intelligence allow application developers to get away with this? Well, quite frankly we are red-headed stepchildren, to use the parlance of our time. The common convention is that getting the data in is the most important thing, and the data monkeys will always figure out some way to get it out. With existing databases, that is a very valid point. Also, app developers are sort of like the bass players of the IT world: they always look cool and mysterious and get all of the groupies. Sorry, I think I’m spiraling here.)
Fortunately, because HANA can query just about anything super-fast, and because HANA — being just a database — will soon be used to build applications and not just data marts, application developers no longer forced to create stupid data models. If the BI data modelers are brought in earlier in development, not only would agile BI be a given (even less ETL would be required, building the semantic layer would be a snap, and BI would already know the layout), but think about how clean the master data management would be. Each entity or lookup value would have one record (although admittedly slowly-changing dimensions or some other method would need to allow for them to change over time) and it could all be managed from one location and we would KNOW that there would be no confusion!
(Will this make apps harder to build? Maybe a little, but the best application developers will adjust very quickly. See how I just set up application developers to sound like they aren’t very good if they disagree with me? Nice, huh?)
(Also, will this make the data in the better data model more right? Of course not, but it should make it significantly easier to get right in the first place, and to correct when it gets out of sync. If I’ve got to fix bad data, I’d rather do it once and not even need to have it propogate throughout the system.)
I haven’t always been an enormous HANA fan (especially when it is talked about at the SAPPHIRE – It is what we thought it was) but I do think it actually gives us a unique opportunity to not only change the way we design an application ecosystem, but also to step back and design it correctly. Now that could, if even partially-realized, be a game-changer. So thank you HANA, for totally removing the biggest barrier-to-entry for having a truly elegant transactional data model.
And I’m sure application developers would love to have the opportunity to make their applications elegant both on the screen, in the code, AND in the database.