Skip to Content

In this A Deeper Look At Sybase, we’ll now take a look at Sybase IQ, Sybase’s analytics database.

   

Somewhere around 1994 Sybase acquired a company named Expressway that had taken a novel approach towards handling analytics-type workload (although the word ‘analytics’ did not really exist in today’s sense; ‘data warehousing’ was more like the buzzword of that time. And “business intelligence” was something you had as a person rather than as an IT system).
Their principal innovation was to physically store data not by row, but by column — despite fully supporting the conceptually row-oriented ANSI SQL standard.
Sybase renamed the product “IQ” (for “Intelligent Query”, should you wonder) and developed it into the powerful analytics engine it is today.

While column-oriented databases are all the rage in the last few years, it was far from a common database concept in those days. Sybase recognized the potential for analytics and IQ became the first commercially available column database.

 

The underlying idea of a column database is that analytics-type queries rarely touch all columns of a table, but typically only a few, but do so for many rows in a table. By storing all values of a particular column together, the disk I/O required for, say, an aggregation on a column, becomes much more efficient than when the data were stored as rows.

 

Once the data is stored column-wise, various optimizations become possible that help to boost performance further. For example, column values can be encoded in a small number of bits, with a lookup table holding the actual values. Imagine a column holding the color of an item (“red”, “blue”, “yellow” etc.), and that no more than 14 different colors occur in the actual data. Instead of storing the individual color values, IQ creates a lookup table of those 14 color values and encodes each value in 4 bits (which is enough for 16 values). Since there are many duplicate color values, using this encoding saves space, and therefore improves efficiency of disk I/O and memory usage. Also, that series of 4-bit values can often be compressed quite well itself, bringing further efficiency. When running a query involving the color column, the encoded 4-bit value is used in processing  rather than the actual color value itself; only when data is returned to the client is the actual color value substituted back into the result set.
In a nutshell, this is the basic idea of a column-oriented database.

 

 

Despite storing data in a column-oriented manner, from the outside it looks as if IQ operates on rows: IQ supports ANSI SQL with full transactional semantics. When needed, a row is reconstructed on-the-fly from the individual column values.
Breaking up a row into columns and reconstructing it takes CPU cycles. The compression and decompression of data is also CPU-intensive. Indeed, IQ was designed to be hungry for CPU and memory. This is different from OLTP-oriented architectures like Sybase ASE, which are essentially optimized for disk I/O bandwidth.

 

A consequence of the column-oriented nature of IQ is that bulk operations are far more efficient than operations on individual rows. Simply put: updating 1 million rows in a single update statement is very fast in IQ, but when you want to update those rows one by one with an individual update each, you could go off and play a game of golf while it’s running — with time left for lunch. Performing many repeated single row operations should be basically be considered a mistake in IQ.

Another area where Sybase IQ really shines is in bulk-loading of data: as long as you load a lot of data in one operation, it will be very fast. At Sybase we’ve seen load speeds of up to 500GB/hour; loading 1 million rows (or more) per second is typical. Moreover, you can bulk-load data into IQ while your queries continue running — that may sound obvious, but judging by how some competitors peform, it is not.

 

 

 

IQ’s column storage and its additional compression techniques bring some interesting advantages. One phenomenon that often baffles new IQ customers is that the volume of the raw data loaded into IQ actually shrinks once its in IQ — just by loading it, compression will kick in. Compression rates vary, but it is not uncommon to see a reduction to 20-50% of the original raw data size.
A great example is the world record for the world largest data warehouse which Sybase achieved in partnership with Sun Microsystems (when the were still their own). In this case, one Petabyte of raw data was loaded into IQ and occupied only 260 Terabyte of actual storage.

This compression aspect is where column databases in general and IQ in particular have an edge over row-oriented databases. If you want to run Business Intelligence-type applications on a row-oriented database, you will inevitably need to add indexes and pre-aggregate data (creating the ‘cube’). These structures take additional storage space — so when you compare such a row-oriented data warehouse-with-aggegates-and-indexes with the same database in Sybase IQ, the difference in storage size can easily get to a factor of 10 or more. By the time we’re talking about Terabytes of data, such storage space savings are not just convenient, but they also represent real money. As a bonus, they have allowed the Sybase marketing department to push the notion that IQ is “green” technology (‘cos you need less disk storage and less electricity, thus, supposedly, melting less of the polar icecaps. Who would not want to feel good about their data warehouse?)

   

 

By default, IQ is a stand-alone single-process database server running on a single host. For scaling to larger numbers of users, IQ also comes in a clustered version known as “IQ Multiplex” (as opposed to “Simplex” for the default single-process server). In Multiplex, multiple IQ server processes run on clustered hardware (like a set of Linux blades on a backplane).
Architecturally, IQ Multiplex is a hybrid cluster. On the one hand it is a shared-disk architecture in the sense that all cluster nodes can access all data. On the level of the data caches of the Multiplex nodes, it is a shared-nothing cluster in that the cluster nodes do not coordinate modifications to cached data directly between each other (a different mechanism is used, which is out of scope for this article).

 

 

IQ has some great success stories. One of the better-known customers is Nielsen Media (the folks doing the TV viewer ratings in the USA), who have appeared on stage at past Sybase Techwave conferences. The amount of data they must crunch is huge, and to very short timescales — and for many years, they’ve been doing that with IQ. Another interesting factoid is that *all* insurance companies in South Korea use IQ – the competitive edge provided by IQ must have been irresistable. And there’s a story about a major airline who started using IQ for fraud detection, and discovered entirely new types of fraud they didn’t yet know about. For more customer success stories with IQ, see http://www.sybase.com/detail_list?id=73623

 

 

It is worth noting that Sybase IQ is proven technology with a long track record. Even though every database startup these days seems to be doing a column database, there are typically big practical differences with IQ. A particularly telling test is to try running some complex queries concurrently while loading data into the tables being queried (that’s what you do when you refresh your data warehouse while applications continue working — real-time BI, in other words). Many new arrivals on the column database market will have a hard time with such a test, but IQ manages very well.

 

IQ tends to be especially cost-effective on Linux — but it is of course available on all major platforms.

The current latest version of IQ is version 15.2. The next version -15.3- has been announced to be released in the summer of 2011.

 

More information about IQ, as well as a downloadable evaluation version, can be found at www.sybase.com/iq.

 

  Next topic: A Deeper Look At Sybase: Replication Server.

To report this post you need to login first.

6 Comments

You must be Logged on to comment or reply to a post.

  1. Erika Atencio
    Hi, Rob. I’m a Sybase IQ user and I can tell you it’s nice to work with it because it can work pretty fast with a lot of data and what I love more is that this not hard to manage.

    (0) 
  2. Martin English
    Hi Rob,
      While IQ supports ANSI SQL with full transactional semantics, the extra CPU and memory needed to, for example, reconstruct rows  on-the-fly implies a totally different mind-set is required for tuning columnar databases compared to  row databases (i.e. all the others that a BASIS guy would see).  Even though the SQL engine is (hoepfuly !!) designed to allow for this, my first thoughts are that SQL needs to be written with different constraints in mind.

    For example, since reading and buffering all fields of an entire record takes longer using Sybase IQ, therefore all SQL should only request the necessary columns – any SQL that requests more columns than necessary should be rewritten.  I’d like your comments on this line of thinking, please.

    BTW, I’ve downloaded a couple of documents from http://www.sybase.com/ that people may find useful;
    http://www.sybase.com/files/White_Papers/ASE-IMDB-Performance-WP.pdf
    and
    http://www.sybase.com/files/White_Papers/Sybase_SUP_1.5.2_Perf_and_Tuning_WP.pdf

    Would these be suitable for a first time introduction ?

    (0) 
    1. Rob Verschoor Post author
      Functionally, IQ gives exactly the same results as Sybase ASE (Sybase’s row-oriented OLTP database) or any other ANSI-compliant DBMS, for that matter.

      When it comers to writing SQL for IQ this is actually mostly worry-free. Your typical analytic query does not access all columns, and normally it runs just fine on IQ. In fact, IQ is exceptionally good at handling ad-hoc queries that have been written without any specific IQ considerations in mind (this aspect, where IQ allows you to save on specialised development effort, is something that we have never emphasised much at Sybase, unfortunately).

      Clearly there are some recommended best practices for writing SQL  (use bulk operations and avoid 1-by-1 algorithms; selecting all columns of a  very wide table does indeed carry some overhead), but in IQ, tuning really works the other way around: based on the type of queries that you have, you decide which indexes to create (IQ has a bunch of different types).
      Or, you can let IQ figure it out: just run your queries and then do what the IQ index advisor tells you — there is no more cost-effective approach than this. IQ is also highly autonomous and requires very little DBA effort in comparison with other databases.

      On the two whitepapers you mention: these are unrelated to IQ. The first is about one particular feature in ASE 15.5 (the in-memory DB), the second is about the Sybase mobility products.
      For IQ-specific whitepapers, go to http://www.sybase.com/iq.

      HTH,

      Rob V.

      (0) 
  3. Maksim Navolochka
    Hi, Rob.
    I’m a happy reader of your “Tips, tricks and recipes for ASE” book, great thanks for it.
    Are you going to write something similar about IQ?
    (0) 
    1. Rob Verschoor Post author
      I’m glad you find the Tips&Tricks book useful!
      Right now, I am working on a quick reference guide for IQ. I hope to have this ready before the end of this year. Keep an eye on http://www.sypron.nl, or subscribe to my mailing list (www.sypron.nl/notify) for updates.
      (0) 
  4. Neal Stack

    I witnessed the difference between bulk and single delete statements this week.  One of the products I support was sending 35000 individual delete statements to IQ.  It took about 16 hours to complete (there were similar activities occuring against 70 other tables during this period).

    The solution for my product was to use the client-side load table command to populate a temporary table with the key values of the rows we want deleted, then issue a delete statement that uses a join between the main table and the temporary table to delete rows.  By following this approach, the delete was nearly instantaneous.

    (0) 

Leave a Reply