I've been watching SAP's acquisition of Sybase pretty closely, but not because of their Sybase Love Fest Part 2 - Mobility (which seem to have everyone else pretty excited). I've been more interested in their data warehouse tools, specifically Sybase IQ and Sybase Replication Server.
What are the pieces?
The tools themselves are pretty straightforward.
- Sybase Replication Server basically tracks relational database changes by reading the logs created as rows are updated. The tool works on practically every major database platform and "replicates" those changes to a copy of the source database in real-time (or at least as close to real-time as you can conceivably get). The tool does allow for a limited amount of data clean-up/transformation, but its secret sauce is really to get the data out of the transactional system so that you can analyze it without impacting performance on the actual transactional system. Once you have the copy you can certainly still transform it, but you aren't limited to "normal" ETL windows (in the middle of the night) because playing with the copy of the data doesn't impact the source system at all beyond the standard logging which your database platform already provides (if you don't already have that turned on, I'm assuming you probably should). And, because you say "go copy... that" you don't need extensive ETL resources to get it up and running.
- Sybase IQ is a columnar-based database (as opposed to a relational database, which is row-based) that allows for super-fast query times. Although it doesn't really have a robust query interface of its own, it can be used as the source for pretty much anything that can read a more traditional database -- Crystal Reports, an SAP BusinessObjects universe, etc. The performance of this database is at least comparable with other vendors, but Sybase stands out because it can sit on commodity hardware so you don't need to have a physical data warehouse "appliance" to speed up your queries. It also scales linearly with whatever hardware you have, so if you need more data stuffed into there (at a highly compressed rate, I might add) you just add more hardware. Finally, because the tool figures out its own appropriate indexes, you don't need a ton of DBA time to manage the whole system.
So what?
All of that is obviously cool, but what does that actually mean? To me, the combination means we are considerably closer to Agile BI. Most operational reports need to be run against real-time data, which in the past has meant running against your transactional database, which slows it down. Most analytical data, because it is generally transformed and massaged for performance (amongst some other reasons) cannot be real time. But if we could always have a real-time copy of a transactional system that would spit back complex queries super-freaking fast, we might not necessarily need all of the overhead of ETL. We build the transformation logic into the semantic layer (which, thanks to BI4 can be largely deployed anywhere) and voila, we have an instant operational data store (ODS).
More good stuff
- You can build serviceable ODS's very quickly.
- You don't need to invest a lot of time into building or maintaining indexes and ETL.
- Your analysts don't need to learn multiple data structures (since you use the same one they are familiar with in the transactional system twice).
- Both your reports AND your transactional system should be faster.
Gotchas
- Replication Server is really good at replicating data. If you have bad data, you'll now have twice as much.
- You'll still most likely want a data warehouse for long term trends, archiving, yada yada.
- This is NOT a master or metadata solution.
Buzzword hat tip
Does this really get us closer to "Agile BI"? I think so. We constantly talk about all of the new data that exists out there, but we don't often talk about how to get there. The old data warehousing techniques are fantastic at enabling analysis on cleansed, consolidated, aggregated data from static sources, but with their strict data models and heavy ETL they just aren't very nimble. Because data is becoming agile (and big, and everchanging) BI needs to become agile. ETL as it stands now is NEVER going to be agile, but with these Sybase tools we can grab the data from wherever it is and be far more agile with the new tools in BI4 for building a semantic layer and integrating it everywhere.