Scale Out of Union Operators with IQ 16 SP10!
We have introduced an important performance enhancement with IQ 16 SP10 – parallelism of union operators.
Union operators occur commonly in analytics queries for two reasons: combining similar heterogeneous data sources, and partitioning data for ILM (information lifecycle management) purposes. Both use cases can flow large amounts of data through union operators during query processing.
IQ has supported MPP processing for most query operators since the 15.3 release. However, if a query contained unions, there were many cases where the processing could not be parallelized. In SP10, union operators can be fully parallelized and distributed regardless of the number of unions, union arms, or operator types! (Note that some type of reduction such as a Group By, OLAP, or filter is required for good scaling.) This feature further strengthens IQ’s position as a high-performance and cost-effective analytics database.
Unions can be partially parallelized even if some arms cannot allow parallelism or are not worth parallelizing on a cost basis. Union queries are parallelized automatically. No schema changes or tuning required. In-house performance testing has demonstrated linear scale-up with CPU cores (typically 12x-64x) for queries that flow large amounts of data through unions.
This is a completely free and transparent feature that can provide significant performance improvements over previous releases on the same hardware. You can read about all the performance optimization features of IQ 16 SP10 here:
looks like the website housing the link is down. 🙁
seems to be working now
Thanks for this.
I was looking for partitioning my tables which are large and I was under the dilemma of proper partitioning vs union all views.
There was discussion on this on IQUG and the Mark Mumy had suggested that we should use partitioned table. However, it does seem that the new change for union all queries will make union all as a serious contender.
I would appreciate your response on this. The discussion is given below.
http://www.iqug.org/archive/2015-January/004921.html
Thanks
Depends what you are trying to accomplish.
Partitioned tables will create fewer issues with query optimization since they are a simple entity with consolidated metadata and indexes. This optimizer treats them just like non-partitioned tables. Partitioned tables also collocate data in the partitioning key, which can reduce projection IO.
Unioned tables have inherently disjoint data and the optimizer must decide how to consolidate metadata and which operations can be pushed into the view..Union tables can perform better for drop operations since there are no global structures to update. They can also have more complex definitions and be layered.
Performance of union tables should be comparable to partitioned tables provided that the same query optimizations can be applied. The concerns expressed by Mark largely disappear with this parallel union enhancement since IQ can fully parallelize queries that do not have aggregates that can be pushed and flow a lot of data through the onion.
Feel free to have a bake-off between the two solutions and please share your results.