Skip to Content

Are you new to SAP Sybase IQ?  Have you used the product but are still not clear on what indexes to use or when to use them?  Have you just migrated to SAP Sybase IQ 16 and are wondering if there are any indexing changes?  This blog will discuss indexes and strategies for SAP Sybase IQ.

Basic Indexes

When creating a table in IQ, a default index is placed on every column.  This index is referred to as a Fast Projection (FP) index.  In order to enable optimized FP indexes, one must set the Minimize_Storage option to ON.

set option public.minimize_storage=’on’;

From this point, all new tables and columns will use n-bit encoding for highly optimized and compressed data storage and access.  Each FP index will be encoded as 1 to 31 bits depending on the data cardinality (2^1 to 2^31 distinct values).

Advanced Indexes

In addition to the basic FP indexes, IQ supports nine other indexes:

  • High Group (HG) – used for join columns and searchable columns whose data has more than 2,000 distinct values
  • Low Fast (LF) – used for searchable columns whose data has fewer than 2,000 distinct values
  • High Non Group (HNG) – used for range searches (except for date, time, and datetime data types) and for single column aggregations
  • Word (WD) – Use for whole word searching of character based data
  • Text (TEXT)– Use for full text searching of character based data
  • Date (DATE) – Use for fields that have a DATE data type (ranges and datepart queries)
  • Time (TIME) – Use for fields that have a TIME data type(ranges and datepart queries)
  • Datetime (DTTM) – Use for fields that have a DATETIME/TIMESTAMP data type (ranges and datepart queries)
  • Compare (CMP) – Use for ore-computing the comparison of two columns in the same table

When are indexes used?

In most relational database engines, the optimizer chooses the best single index for a given table in the query.  SAP Sybase IQ takes the optimization process to the next level by looking not only at each column in the query, but at all joins and search arguments.  For example, assume that a date column is used in a join and to search for the month datepart.  The optimizer would prefer to use the HG index on the date field for the join.  It would also prefer to use the DATE index to search on the month datepart.

In short, IQ will look at each join, where clause, aggregation, etc. to determine the best index for each operation.  Not just one index per table and not just one index per column.  But rather, one index for each use of each column in the query!

Should we be concerned about over indexing?

The short answer is an emphatic NO!  As mentioned previously, typical relation database engines pick the best single index to access data in a table.  Second, when adding indexes to most RDBMS engines the load times tend to slow down due to how data is processed.  Third, most database engines store data and indexes in different structures so that as more indexes are added the table size swells.  All of these lead to trying to minimize the total number of indexes on a table in most engines.

SAP Sybase IQ is not prone to these limitations.  We pick as many indexes as needed to process the entire query.  Indexes in IQ are not as large as the data being indexed.  Each of the indexes employs various levels of vectors, bitmaps, data representations, and compression to dramatically decrease the database size.  Finally, when loading data, IQ can consume all CPU resources.  We currently released a fully audited benchmark in which SAP Sybase IQ loaded data at over 34.3 TB/hour on an 80 core Linux system (http://scn.sap.com/community/developer-center/analytic-server/blog/2013/05/30/sap-sybase-iq-16-sets-guinness-world-record-with-343tb-per-hour-for-big-data-loading).

All of these SAP Sybase IQ features lead to an engine that can handle as many indexes as necessary for all query (canned or ad hoc) activity.  It is quite common to have 30-60% of all columns indexed with the advanced indexes (and in some cases multiple indexes per column).

When to use Indexes?

Now that we have some basics about indexes, the big question is when to use them? Here are some simple guidelines that will help cover every situation.  The general idea is to ask a series of questions about each column.  The answers to the questions will dictate whether or not indexes are needed, and if needed which one(s) are best.

  • Will this column be used as a JOIN clause?
    • NO: Continue to next question
    • YES: Use an HG index
  • Will this column be used as a search column in the WHERE clause?
    • NO: Continue to next question
    • YES: Use an HG index if the column has more than 2,000 distinct values.  If less than 2,000 distinct values use an LF.
  • Is the column data type a DATE, TIME, TIMESTAMP, or DATETIME?
    • NO: Continue to next question
    • YES: Use a DATE, TIME, or DTTM index.  All date/time/datetime fields should have this index as systems quite frequently use date fields for searching, filtering, and aggregations.
  • Will whole word searching be done on the character data?
    • NO: Continue to next question
    • YES: Use WD (WORD) index
  • Will text searching be done on the character data?
    • NO: Continue to next question
    • YES: Use a TEXT index
  • Will this column be compared to another column in the same table?
    • NO: Continue to next question
    • YES: Use a CMP (compare) index that includes both columns
  • Will this column be used for range searches?
    • NO: Continue to next question
    • YES: Use an HNG (high non group) index
  • Will this column be used in aggregations with no other columns?
    • NO: Continue to next question
    • YES: Use an HNG (high non group) index

There are two types of indexes in IQ: those that have detailed statistics and those that have minimal statistics.  The HG, LF, TEXT, and WD indexes contain detailed statistics.  The HNG, CMP, DATE, TIME, and DTTM indexes contain minimal statistics.  After walking through the above checklist, you may find that an index with minimal statistics was all that was warranted for the column.  Should that situation arise, it is also recommended that an index with detailed statistics is placed on that column.  Use an HG or LF index for these detailed statistics.  The choice will depend on the current or expected cardinality of data in the column.

The process to choose indexes may seem daunting especially if the data model has a significant number of tables and columns.  The process outlined above assumes that very little is known about the application and its use of the data.  This is, generally, not the case as most DBAs and data modelers do know more about the data and its use.

Should you know more about the model and data, we can simplify the process.  I prefer to look at columns that are known to be used in WHERE clauses of most queries and reports.  Indexes on join and search criteria are the single biggest factor in performance, so focusing solely on those will allow IQ to be indexed for nearly every situation.

Certainly some columns and indexes will be missed with this approach.  However, since the focus was on columns that are heavily used, the impact of missing a few minor, or less frequently used, columns should be relatively low.

Lastly, know that IQ provides a safety net for us!  IQ has an index advisor that can be enabled by setting the option Index_Advisor to ON.

set option public.Index_Advisor=’on’;

Once on, IQ will print index advice to query plans (if they are on) as well as to a system table that can be accessed via the stored procedure sp_iqindexadvice.  The index advisor output will note under what circumstance the index is needed.  The stored procedure will also aggregate the number of times the advice was made.  Should that procedure be used to trend data over a period of time (days, weeks, even months) the number of times the same advice was made can help focus efforts on the most recommended advice.

To report this post you need to login first.

5 Comments

You must be Logged on to comment or reply to a post.

  1. Martin Balla

    Hello Mark,

    i have 2 notes to your good introduction into Sybase IQ.

    The Option public.minimize_storage=’on’ is related to Sybase IQ 15 to enable the FP1,2 or 3 Byte encoding of the datacolumn if possible. Depending on the cardinality of the datacolumn.

    In Sybase IQ 16 we have the new technology of bitencoding and so there is no need anymore for that option.

    My experience with indexes on Sybase IQ is:

    You can archieve a lot of query speed when you have a optimized datamodel and queries that avoid joins between the columns. For IQ its better to have a maybe more denormalized db schema with a large amount of redundant entries in the column. The very good compression technologie can deal with that amounts of data. In this situation no advance can be expected from indexes.

    If you have an strongly normalized datamodel with many relations in it, as it would be good for row store OLTP dataserver, you can expect poor performance from the Sybase IQ. Indexes can help you in this situation a little bit, as you mentioned it in your article.

    regards

    Martin Balla

    (0) 
    1. Mark Mumy Post author

      Martin,

      Good points.  But keep in mind two things with respect to the default indexes.  First,not all IQ 16 servers will use nbit indexes.  If a customer does an in-place migration to IQ 16 the FP indexes (fp1, fp2, fp3, and flat-fp) are still in use for existing AND new tables.  An in-place migration puts a lot of new IQ 16 features on hold until the DBA specifically changes them. Second, not all customers are using IQ 16 yet.  Using the Minimize_Storage option in a newly built IQ 16 instance does no harm.  What it does, though, is catch any user on any version of IQ that wants to use optimized FP indexes in any form (1-3 byte, or the new n-bit).

      Certainly reducing the joins in IQ can have performance impacts and make things run faster.  Historically, this was a practice that I widely used.  More recently, though, we have enhanced the product to where this is not needed nearly as much. 

      In my opinion, this isn’t a firm rule, either.  Denormalization of the model carries some massive changes to applications: ETL, queries, reports, end users must all be made aware of the changes so that the code and processes can change.  Quite a few customers prefer to keep the model intact so that users and applications have little to no changes to make from a modeling perspective.  This approach brings in some performance improvements but does leave room for a lot more as the system matures.  Over time when more performance is needed out of the system, investigating model changes for performance reasons is warranted.

      (0) 
      1. Martin Balla

        Thanks Mark for your response,

        yes i agree with your explanations. It’s my experience with some customers who want’s migrate from a row to a column store DB without changing ETL Processes and Schema that the performance of the IQ is often below the performance expectations. In this case some indexes had helped a little.

        We’re fortunately often in the situation to have the possibility to define a schema and a load process only under performance aspects in several Green IT and industrial research and test programs we participate. We did a lot of research to optimize the query efficiency on small hardware. IMO an optimized schema and a large page size has the biggest performance impact.

        greetings Martin

        (0) 
  2. Mohammed Razal V N

    Hi Mark,

    Excellent one !!

    Also do you have a newer version “Tips and Tricks for Optimizing Performance with SAP Sybase IQ

    I downloaded a version  which is available in slideshare, Is that latest one, which was created on November  2012

    Thanks,

    Razal

    (0) 

Leave a Reply