Skip to Content

I’ve been watching SAP’s acquisition of Sybase pretty closely, but not because of their Sybase Love Fest Part 2 – Mobility (which seem to have everyone else pretty excited). I’ve been more interested in their data warehouse tools, specifically Sybase IQ and Sybase Replication Server. 

What are the pieces?

The tools themselves are pretty straightforward.

  • Sybase Replication Server basically tracks relational database changes by reading the logs created as rows are updated. The tool works on practically every major database platform  and “replicates” those changes to a copy of the source database in real-time (or at least as close to real-time as you can conceivably get). The tool does allow for a limited amount of data clean-up/transformation, but its secret sauce is really to get the data out of the transactional system so that you can analyze it without impacting performance on the actual transactional system. Once you have the copy you can certainly still transform it, but you aren’t limited to “normal” ETL windows (in the middle of the night) because playing with the copy of the data doesn’t impact the source system at all beyond the standard logging which your database platform already provides (if you don’t already have that turned on, I’m assuming you probably should). And, because you say “go copy… that” you don’t need extensive ETL resources to get it up and running.
  • Sybase IQ is a columnar-based database (as opposed to a relational database, which is row-based) that allows for super-fast query times. Although it doesn’t really have a robust query interface of its own, it can be used as the source for pretty much anything that can read a more traditional database — Crystal Reports, an SAP BusinessObjects universe, etc. The performance of this database is at least comparable with other vendors, but Sybase stands out because it can sit on commodity hardware so you don’t need to have a physical data warehouse “appliance” to speed up your queries. It also scales linearly with whatever hardware you have, so if you need more data stuffed into there (at a highly compressed rate, I might add) you just add more hardware. Finally, because the tool figures out its own appropriate indexes, you don’t need a ton of DBA time to manage the whole system.

 So what?

All of that is obviously cool, but what does that actually mean? To me, the combination means we are considerably closer to Agile BI. Most operational reports need to be run against real-time data, which in the past has meant running against your transactional database, which slows it down. Most analytical data, because it is generally transformed and massaged for performance (amongst some other reasons) cannot be real time. But if we could always have a real-time copy of a transactional system that would spit back complex queries super-freaking fast, we might not necessarily need all of the overhead of ETL. We build the transformation logic into the semantic layer (which, thanks to BI4 can be largely deployed anywhere) and voila, we have an instant operational data store (ODS). 

More good stuff

  • You can build serviceable ODS’s very quickly.
  • You don’t need to invest a lot of time into building or maintaining indexes and ETL.
  • Your analysts don’t need to learn multiple data structures (since you use the same one they are familiar with in the transactional system twice).
  • Both your reports AND your transactional system should be faster. 

Gotchas

  • Replication Server is really good at replicating data. If you have bad data, you’ll now have twice as much.
  • You’ll still most likely want a data warehouse for long term trends, archiving, yada yada.
  • This is NOT a master or metadata solution. 

Buzzword hat tip

Does this really get us closer to “Agile BI“? I think so. We constantly talk about all of the new data that exists out there, but we don’t often talk about how to get there. The old data warehousing techniques are fantastic at enabling analysis on cleansed, consolidated, aggregated data from static sources, but with their strict data models and heavy ETL they just aren’t very nimble. Because data is becoming agile (and big, and everchanging) BI needs to become agile. ETL as it stands now is NEVER going to be agile, but with these Sybase tools we can grab the data from wherever it is and be far more agile with the new tools in BI4 for building a semantic layer and integrating it everywhere.

To report this post you need to login first.

4 Comments

You must be Logged on to comment or reply to a post.

  1. Dan Grew
    Have you had chance to test either of these products? I’d be intestered to find out how easy the replication server is to use and how the performance of IO compares to relational databases and/or appliances.
    (0) 
    1. James Oswald Post author
      Thanks, Dan. Unfortunately I haven’t gotten my hot little hands on this stuff, but I’m trying my darndest to. I’ll report back when I know more.

      As far as speed to entry, they really make it sound like the systems can be set up in a few days, and (if you aren’t doing anything fancy) you can just point at the source database, copy it into IQ, and start getting a 10-100x performance improvement in no time.

      (0) 
      1. Erika Atencio
        Yes, that’s true, the systems can be set up in a few days. Last year I worked in a project that was done pretty quick thanks to Sybase IQ and to the end-users that really knew what they wanted in their datawarehouse. 

        Sybase has a kind of structures named Proxy Tables that allow you to point at the source database easily.  I don’t know if non-Sybase databases have other ways to achieve this because my experience is mainly with Sybase products but I can tell you that it is great to work with proxy tables.   

        (0) 

Leave a Reply