Many of you are aware that SAP set a Guinness record in 2014 for the World’s Largest Audited Data Warehouse at 12PB, with SAP IQ and SAP HANA. During the course of that effort, we determined that an enhancement to the IQ query optimizer would result in radically improved performance for certain kinds of queries on hash partitioned or hash-range partitioned tables. The particular queries I am referring to are applicable to Big Data scenarios, such as analytics on large amounts of sensor data. Here, you might be collecting sensor data at many geographic locations, and want to analyze the data from one particular location only. The data modeling for this use case lends itself to data partitioning – where each partition is mapped to a particular site location and the data in that partition represents the sensor data associated with that location. When doing analytics on sensor data localized to a particular site, database queries that quickly eliminate all partitions but the one of interest will perform dramatically better than a standard query using indexes alone.
The query engine optimization we implemented in IQ 16 SP08 PL3 leverages the information available in a hash-partitioned or a hash-range-partitioned table to improve the performance of some equality conditions. The optimization is applied when the SQL statement contains a condition of the form: <column> = <constant-expression>, and the column is the sole basis for the hash partitioning of the table. When these conditions are true, then the query optimizer will infer an additional condition of the form:
HashPartitionNumber( <column> ) = <integer-constant>
where the integer constant is the hash partition number for the constant expression in the original condition. This condition will appear in the query plan like any other inferred condition. By executing this inferred condition before the original equality condition, the original equality condition needs to scan only the cell values for rows contained within that one hash partition.
We have gathered some impressive performance statistics on large volumes of hash partitioned data with this query optimizer enhancement. Here is a definition of a hash partitioned table:
create table sensor_data (
measurement_datetime datetime NULL,
rig_id VARCHAR(20) NULL,
sensor_id VARCHAR(20) NULL,
PARTITION BY HASH (rig_id);
No additional secondary indices were added to this table. The table was loaded with 1.8 trillion records, and the following test queries were executed, before and after applying the query optimization enhancement:
- select count(*) from sensor_data where rig_id=’rig_29′
- select sensor_id, measurement_value, measurement_datetime from sensor_data where
- select sum(measurement_value) from sensor_data where rig_id=’rig_102′
All of these queries executed between 50 and 200x faster after the query optimizer improvement was applied.
SAP has been investing significant R&D in IQ 16 scale out improvements. The query optimizer improvement described here was released as part of IQ 16 SP08 PL3. It is a testament to the solid design and architecture of IQ that the engineering team was able to implement this enhancement in the span of about 1 month. Note that we have other scale out improvements planned for IQ 16 – most noteworthy is an IQ shared nothing MPP configuration.
We are excited about continuing innovations in SAP IQ, targeted at very large scale data analytics. You can read more about SAP IQ on our SCN site: http://scn.sap.com/community/iq