SAP IQ Indexes and Indexing Techniques

markmumy · ‎06-25-2013

Are you new to SAP IQ? Have you used the product but are still not clear on what indexes to use or when to use them? Have you just migrated to SAPf IQ 16 and are wondering if there are any indexing changes? This blog will discuss indexes and strategies for SAP IQ.

Basic Indexes

When creating a table in IQ, a default index is placed on every column. This index is referred to as a Fast Projection (FP) index. In order to enable optimized FP indexes, one must set the Minimize_Storage option to ON.

set option public.minimize_storage=’on’;

From this point, all new tables and columns will use n-bit encoding for highly optimized and compressed data storage and access. Each FP index will be encoded as 1 to 31 bits depending on the data cardinality (2^1 to 2^31 distinct values).

Advanced Indexes

In addition to the basic FP indexes, IQ supports nine other indexes:

High Group (HG) – used for join columns and searchable columns whose data has more than 2,000 distinct values

Low Fast (LF) – used for searchable columns whose data has fewer than 2,000 distinct values

High Non Group (HNG) – used for range searches (except for date, time, and datetime data types) and for single column aggregations

Word (WD) – Use for whole word searching of character based data

Text (TEXT)– Use for full text searching of character based data

Date (DATE) – Use for fields that have a DATE data type (ranges and datepart queries)

Time (TIME) – Use for fields that have a TIME data type(ranges and datepart queries)

Datetime (DTTM) – Use for fields that have a DATETIME/TIMESTAMP data type (ranges and datepart queries)

Compare (CMP) – Use for ore-computing the comparison of two columns in the same table

When are indexes used?

In most relational database engines, the optimizer chooses the best single index for a given table in the query. SAP IQ takes the optimization process to the next level by looking not only at each column in the query, but at all joins and search arguments. For example, assume that a date column is used in a join and to search for the month datepart. The optimizer would prefer to use the HG index on the date field for the join. It would also prefer to use the DATE index to search on the month datepart.

In short, IQ will look at each join, where clause, aggregation, etc. to determine the best index for each operation. Not just one index per table and not just one index per column. But rather, one index for each use of each column in the query!

Should we be concerned about over indexing?

The short answer is an emphatic NO! As mentioned previously, typical relation database engines pick the best single index to access data in a table. Second, when adding indexes to most RDBMS engines the load times tend to slow down due to how data is processed. Third, most database engines store data and indexes in different structures so that as more indexes are added the table size swells. All of these lead to trying to minimize the total number of indexes on a table in most engines.

SAP IQ is not prone to these limitations. We pick as many indexes as needed to process the entire query. Indexes in IQ are not as large as the data being indexed. Each of the indexes employs various levels of vectors, bitmaps, data representations, and compression to dramatically decrease the database size. Finally, when loading data, IQ can consume all CPU resources. We currently released a fully audited benchmark in which SAP IQ loaded data at over 34.3 TB/hour on an 80 core Linux system (http://scn.sap.com/community/developer-center/analytic-server/blog/2013/05/30/sap-sybase-iq-16-sets-...).

All of these SAP IQ features lead to an engine that can handle as many indexes as necessary for all query (canned or ad hoc) activity. It is quite common to have 30-60% of all columns indexed with the advanced indexes (and in some cases multiple indexes per column).

When to use Indexes?

Now that we have some basics about indexes, the big question is when to use them? Here are some simple guidelines that will help cover every situation. The general idea is to ask a series of questions about each column. The answers to the questions will dictate whether or not indexes are needed, and if needed which one(s) are best.

Will this column be used as a JOIN clause?
- NO: Continue to next question
- YES: Use an HG index

Will this column be used as a search column in the WHERE clause?
- NO: Continue to next question
- YES: Use an HG index if the column has more than 2,000 distinct values. If less than 2,000 distinct values use an LF.

Is the column data type a DATE, TIME, TIMESTAMP, or DATETIME?
- NO: Continue to next question
- YES: Use a DATE, TIME, or DTTM index. All date/time/datetime fields should have this index as systems quite frequently use date fields for searching, filtering, and aggregations.

Will whole word searching be done on the character data?
- NO: Continue to next question
- YES: Use WD (WORD) index

Will text searching be done on the character data?
- NO: Continue to next question
- YES: Use a TEXT index

Will this column be compared to another column in the same table?
- NO: Continue to next question
- YES: Use a CMP (compare) index that includes both columns

Will this column be used for range searches?
- NO: Continue to next question
- YES: Use an HNG (high non group) index

Will this column be used in aggregations with no other columns?
- NO: Continue to next question
- YES: Use an HNG (high non group) index

There are two types of indexes in IQ: those that have detailed statistics and those that have minimal statistics. The HG, LF, TEXT, and WD indexes contain detailed statistics. The HNG, CMP, DATE, TIME, and DTTM indexes contain minimal statistics. After walking through the above checklist, you may find that an index with minimal statistics was all that was warranted for the column. Should that situation arise, it is also recommended that an index with detailed statistics is placed on that column. Use an HG or LF index for these detailed statistics. The choice will depend on the current or expected cardinality of data in the column.

The process to choose indexes may seem daunting especially if the data model has a significant number of tables and columns. The process outlined above assumes that very little is known about the application and its use of the data. This is, generally, not the case as most DBAs and data modelers do know more about the data and its use.

Should you know more about the model and data, we can simplify the process. I prefer to look at columns that are known to be used in WHERE clauses of most queries and reports. Indexes on join and search criteria are the single biggest factor in performance, so focusing solely on those will allow IQ to be indexed for nearly every situation.

Certainly some columns and indexes will be missed with this approach. However, since the focus was on columns that are heavily used, the impact of missing a few minor, or less frequently used, columns should be relatively low.

Lastly, know that IQ provides a safety net for us! IQ has an index advisor that can be enabled by setting the option Index_Advisor to ON.

set option public.Index_Advisor=’on’;

Once on, IQ will print index advice to query plans (if they are on) as well as to a system table that can be accessed via the stored procedure sp_iqindexadvice. The index advisor output will note under what circumstance the index is needed. The stored procedure will also aggregate the number of times the advice was made. Should that procedure be used to trend data over a period of time (days, weeks, even months) the number of times the same advice was made can help focus efforts on the most recommended advice.

SAP IQ Indexes and Indexing Techniques

Get Your SAP HANA Idea Incubator Badge Today!

SCN Mission - SAP HANA Quiz Challenge is now retired

Share your #HANAStory and Win