I have been involved with the IQ products an intellectual property for 22 years. I have worked with many of our customers over the years, including those that are constantly pushing the limits of technology.
I wanted to put together a quick blog on some of the dos and don’ts with respect to building a system that is based on SAP IQ and its offshoots (SAP HANA dynamic tiering, SAP Nearline Store for Business Warehouse, and SAP Information Lifecycle Management for SAP Applications). Many of the ideas and concepts are captured in the SAP IQ manuals and best practices guides. Think of this as a quick glance or distillation of those topics, not a replacement.
For your reference, these advanced topics have been covered in other areas:
- SAP IQ Hardware Sizing Guide
- Guidance For Building a Petascale SAP IQ Environment
- SAP IQ Indexes and Index Usage
One of the most basic features of any database is often overlooked. Back in the beginning days of computers and software, developers, engineers, and coders where very precise about how the underlying storage structures were defined. Remember Y2K? Systems used two digits to represent a year rather than all four digits. As systems became more powerful and larger, though, we started not to care so much about optimizing storage. Why spend time trying to save megabytes when we have gigabytes of RAM?
In larger systems, though, this oversight can lead to much larger databases and poorer performance. Imagine using a character string (10 digits, yyyy-mm-yy) to represent a data rather than a 4 byte date field. Uncompressed, the data is 2.5 larger than it needs to be.
IQ does an incredible job of compressing data on disk. In reality, the disk space difference for the date use case is minimal. The cardinality is low enough that both definitions would use key-value pairs to represent the data with the keys being based on uniqueness.
Where the datatype size difference matters is in memory and memory processing. Once IQ data is moved to memory, it is uncompressed in various stages. In the worst of all situations, the raw data is loaded into memory rather than the key-value pair. If the table has 1 billion rows, that’s the difference between 10 GB and 4 GB for the two field definitions. A process to scan that data needs 250% more cycles for the wider definition!
When possible, use a right sized type, not just any type that will work. While the impact may not be felt on disk, it can certainly be felt in memory. This is especially true for systems that I’ve seen migrated from the likes of Oracle and Microsoft where it is common to see fields defined as 4,000 or 8,000-byte character fields when just 10-100 characters would suffice. The memory overhead is now not just 2.5x but could be 40-800x!
To Index or Not to Index
SAP IQ has an incredible performance and tuning framework in the form of indexes. In fact, all data is stored in some sort of index structure. Even more incredible is that the optimizer will decide the best index for each search argument or join field (basically, anything in the WHERE clause). It is quite possible for the optimizer to use two or more indexes for a single column! Most database engines use a single index for the entire table!
When building a system that will hold tens of terabytes of data, this is a very good approach to build a high-performance system. What if you are trying to build something that is significantly larger in the hundreds of terabytes or petabyte realm? What if you want to build a system that loads data as quickly as possible?
As the size of the systems grow or the data load times must decrease, over indexing will lead to a larger system or one that cannot load as fast as you wish. In fact, this has lead SAP to change the technology over the years. No, SAP IQ is not moving to an index free or fully automated system. It is, however, being built in such a way that missing an index on a field is not as detrimental as it was prior to SAP IQ 16.0. The default index, n-bit, has been created in such a way that it can compensate for not having a high group or low fast index.
Historically, we follow the guidance I’ve outlined in my blog SAP IQ Index and Index Usage. What I now recommend, is that you don’t over index. Certainly, fields that are used for join criteria should be indexed. Beyond that, though, I recommend only indexing fields that are widely used. For instance, most systems have date fields and use them extensively. Index those fields as they are likely used all the time. However, a geographical region code that is not frequently used doesn’t need an index. I like to call indexes on the infrequently used fields secondary indexes. If you aren’t sure, run a representative query with and without these secondary indexes to gauge the performance difference. More often than not, you will find that they run in about the same time. If the time difference is acceptable, don’t use a secondary inde as it will make your database larger than it needs to be and it will increase the time to load data.
Don’t force OLTP workloads into an EDW engine
As good as SAP IQ is, and it is real good, it is not an OLTP engine. It is not designed to have tens or hundreds of users all trying to change small amounts of data. It complies to the ANSI standards, so it has the syntax to support those workloads. That doesn’t mean that they should be used. SAP IQ shines when you give it a workload that can be run in parallel, data that can be significantly compressed, or a workload that needs to span multiple systems.
Too many times, I have seen customers take existing OLTP style workloads and put them into SAP IQ. They then wonder why a single row operation takes 5-10x longer than before. The OLTP system that you used to run that operation in was tuned for a single operation and designed to run it in a few milliseconds, if that. While SAP IQ can perform the same operation, it is not an optimal workload and may take tens of milliseconds for the same operation.
The opposite is also true. You cannot take an SAP IQ system that loads 1 million rows per second and expect it to perform the same in an OLTP system. Know your workload and know the optimal workloads for your system.
Have Proper Hardware and Performance Expectations
I cannot stress this enough to customers. know your hardware, know your storage, and know your networks. I had a customer come to me and tell me that they needed SAP IQ to load 1 GB of raw data per second for 15 seconds of every minute at peak performance. After looking at their systems, my answer was an emphatic “No, this cannot be done”. They took that to mean that IQ couldn’t do the job. In reality, though, this had nothing to do with IQ and everything to do with their hardware.
First, the SAP IQ host had just one network interface that was running at just 1 Gbit/sec. How can you transfer 1 GB of data per second over a 1 Gbit network?
Second, the SAP IQ storage could handle just 200 MB/sec of total write IO. Assuming a 75% compression on the incoming data (1 GB -> 250 MB), it would take 250 MB/sec of IO.
Lastly, they wanted this done on a virtual machine that had just 6 cores. Those six cores were shared with other VMs. Current sizing tells us that SAP IQ can load 30-40 MB/sec of raw data. With 6 cores, that is just 180-240 MB/sec.
So, no, this wasn’t an IQ limitation but rather a grossly undersized host that could not perform the most basic tasks in the time they required.
“But on my other system…”
Don’t compare SAP IQ to “your other system(s)”. You are using SAP IQ for a reason. Embrace the reason, embrace the change. Constantly comparing it to something else will only slow your process down. I like to tell customers to forget what they know about other technology or other systems. This is not that other system. This is SAP IQ. Be ready to learn something new. Be willing to learn about different technology for different techniques. Be willing to change how the business processes data, but not the final outcome for the business.