Comparing SAP HANA and Sybase IQ – real world performance tests
When SAP talks about HANA, they quite often talk about the 1000x improvements in performance that you can get. A customer asked me last week why SAP HANA would provide any improvement over their implementation of Sybase IQ, if they pinned all the IQ tables in memory. They conjectured that IQ should be just as fast as HANA, right?
In fact, there are several capabilities of HANA as compared to IQ, which should make it substantially faster in the real world, even when IQ operates entirely in-memory. There is a nice blog written by Chris Jones which you can read here, which explains this and some other stuff.
The important thing to note is that IQ is a disk-based data-warehouse that works well with large volumes of memory for caching. HANA is a transactional developer platform written to be in-memory. As we shall see, there are pros and cons to both.
– Both HANA and IQ compress data. But, IQ compresses in bitmaps, and HANA compresses with dictionary encoding which means that HANA only needs to decompress exactly the data it needs, and it does so inside the CPU cache. Because in-memory databases are limited by network bandwidth, this should make HANA much faster than IQ for anything which requires materialization of data.
– HANA is optimized for individual writes. With IQ, you lock a table on write, and it is locked until you specify a commit. This means that you can’t have multiple updaters in IQ, whilst HANA has a multi-version concurrency control (MVCC). This means in practice that many people can write to a HANA table with individual writes, whilst IQ requires one person writing to a table, and in batch. It also should mean that IQ is much faster for bulk loading than HANA. In fact, Sybase IQ holds the world record for bulk loading at 34TB/hour.
– Both databases have support for SIMD, but HANA is highly optimized for the Intel E7 platform and its SSE2 instructions, which allow multiple additions/multiplications in one CPU cycle. This should mean that combined with the compression, HANA is faster at aggregating than IQ.
So, I decided to load the same data into IQ and HANA, and do some comparisons on the same hardware.
For this, I used a SAP HANA size “Medium” system from HP. It’s has 4 CPUs, 40 cores, 512GB RAM, 25x15k 146GB SAS for data and one FusionIO 640GB for logs. The OS is SUSE Linux for SAP Applications. For my testing I use one database at a time.
For SAP HANA, I used SAP HANA 1.0 SP6 Rev.69, which is the latest available.
For IQ, I used Sybase IQ 16.0 SP2, which is also the latest available.
It’s my first time getting to grips with Sybase IQ and Mark Mumy was a big help in . IQ doesn’t come pre-configured out the box and you have to set a few settings to get things to work well. In my database configuration file, I set the following settings:
This basically tells IQ to use 1GB for the cache, and to split the remaining 512GB ram amongst the various processes (30% of RAM to each process). Plus because of hyper threading, IQ thinks I have 80 CPUs, so I have to tell it that I actually have 40.
HANA is definitely easier to install, and requires no special configuration, but this isn’t a big deal in the scheme of things.
The Sybase bulk loader is pretty fiddly and very specific about the file format, number of columns and data quality. Actually this is pretty much the feeling of the IQ platform overall – fantastic technology mixed with a relatively poor user experience. The HANA bulk loader isn’t very feature-rich, but it is much less picky than the Sybase loader. This is definitely an area that both platforms could work on.
Once you get IQ up and running though, it flies for loading. I’m sure that it could be better optimized by an IQ pro, but I found I could load my 62GB fact table in 2m30s. By comparison, I need 10x this long to load the data into HANA. This doesn’t come as a surprise, because IQ doesn’t have to worry about multiple inserters, or dictionary encoding. With HANA, you trade off load performance for the behavior of a transactional RDBMS. IQ is a pure data warehouse.
Queries and Aggregations
Vishal Sikka often talks about how HANA can aggregate 16m/sec/core. In my 40-core system that should translate to 640m aggregations/sec/core. I actually find it is much more variable than this and depends on the join complexity and grouping sets. For a simple table, you can get as much as 31m, and for very complex joins and grouping I see as low as 9m. You will see this in the results below.
In both cases, I see massively parallel behavior and all 40 cores are used simultaneously.
Still – with my 1.4bn table, I have four queries. I generally find that most questions you can ask fall into one of these categories in terms of performance.
|SELECT SUM(AMOUNT)/COUNT(AMOUNT) FROM TRANSACTION||1.2s||1.2s||1.2s|
SELECT GENDER, SUM(AMOUNT)/COUNT(AMOUNT)
FROM TRANSACTION T
JOIN CUSTOMER C ON T.CUSTOMER_ID=C.CUSTOMER_ID
GROUP BY GENDER
SELECT MERCHANT, GENDER, SUM(AMOUNT)/COUNT(AMOUNT)
FROM TRANSACTION T
JOIN CUSTOMER C ON T.CUSTOMER_ID=C.CUSTOMER_ID
JOIN MERCHANT M ON T.MERCHANT_ID=M.MERCHANT_ID
GROUP BY MERCHANT, GENDER
SELECT GENDER, SUM(AMOUNT)/COUNT(DISTINCT T.CUSTOMER_ID)
FROM TRANSACTION T
JOIN CUSTOMER C ON T.CUSTOMER_ID=C.CUSTOMER_ID
JOIN MERCHANT M ON T.MERCHANT_ID=M.MERCHANT_ID
GROUP BY MERCHANT, GENDER
select gender, sum(txamount)/count(txamount), count(txamount), count(distinct t.customer_id)
from transaction t
join customer c on t.customer_id=c.customer_id
AND dob BETWEEN ‘1980-01-01’ AND ‘1989-12-31’
AND postcode like ‘%TW1%’
group by gender;
|SELECT STDDEV(TXAMOUNT) FROM TRANSACTION||410s||409s||3.8s|
With IQ, we see a very similar response time to HANA for Query 1. I’m not sure why that is, but I’m guessing IQ does SUM() and COUNT() on a single table very efficiently. I’d be interested in any IQ expert that can explain this.
Once we get into the realm of complex joins and grouping, HANA outperforms IQ 10:1. This is roughly what we expect because HANA stores its data for faster OLAP retrieval.
Interestingly, the last two questions, where we have a lot of restrictions, plus a COUNT DISTINCT, or a STDDEV, IQ outperforms HANA. We see this in a few places, where IQ’s more mature OLAP engine can outperform HANA.
The HANA Development Platform
This little test doesn’t bring out a lot of the qualities of HANA. It’s worth making a quick note of these:
– If we want to do insertion of data at the same time as loading, HANA queries will continue to run very nicely. It’s less clear to me how IQ will behave.
– With HANA, we have a set of engines which run against the core data: predictive, spatial and business functions.
– HANA provides a full development platform including a development IDE, Integration Services and a Web Server for application build.
– With HANA, we need only store one set of data for OLTP and OLAP workloads. IQ only works as a data warehouse and it requires a separate transactional system.
Update June 2014
I’ve rerun the queries on HANA SPS08 and you can see there has been a nice improvement across the board. Particularly noticeable is the decrease in time to run the query with the most restrictions. This will have a major impact on real-world performance and concurrency. We still don’t have STDDEV and other similar calculations running in the column engine, which impacts these sorts of queries.
There’s no question about it – in the real world, with complex grouping sets, HANA performs 10x as well as IQ. It’s worth noting that IQ is a very fast data warehouse – especially compared to a regular RDBMS.
But make no mistake, IQ is an excellent data warehouse, and its more mature OLAP engine means that for certain operations, it can significantly outperform HANA.
With each release of SAP HANA, the development team optimizes more and more functionality and I have no doubt that what we see here may be very different in future revisions. SAP HANA SP07 is released very soon and I’m interested to see what that will bring.
Quick thank you to Mark Mumy for his assistance getting Sybase IQ running nicely!
Nice comparison, John.
For the second to last query, did you try to create a text index in HANA for the POSTCODE column, and use the CONTAINS operator instead? It might deliver a better result.
Sorry it took me a while to get to this - I had a suspicion that I wanted to test. Here's the results!
So you should never mix different engines with HANA, and by using a FULLTEXT index, you mix the Text Engine and the OLAP Engine.
This causes more data to be pulled from the column store and in this example, the query takes 3x longer using the text index and CONTAINS.
Thanks for checking that out.
Don't understand all the technical details, but then I'm not an expert, and I do get the general gist of it and it satisfied my curiousity as to how HANA would stack up to IQ.
all the SQL were executed as they originally are? any additional index was build with IQ such as HG for join column? do you create VIEW in HANA for good performance?
But it is really good for us to understand HANA vs IQ!
Good questions! I've got indexes in IQ but I will be tuning those a bit more.
In HANA I ran these queries with WITH HINT(OLAP_PARALLEL_AGGREGATION) and by creating an Analytic View. Performance is similar in both cases - Analytic Views are 5-10% faster because the OLAP Engine knows in advance how to optimize query execution.
Nice comparision John.
For the last 2 queries where IQ surpassed HANA was little disappointing. But with the future revisions of HANA this might improve and that time please let us know.
It might be pain to you to test the results for each release but the community is expecting more from you (especially me 🙂 )
Thanks for giving opportunity to learn from you by sharing.
Disappointing, perhaps. Surprising - no. Sybase IQ is 20 years old this year, and it has a very mature optimizer and column store engine.
HANA is catching up fast, and I do indeed test almost every revision. There is a nice surprise most months. But the way database optimizers work is you have to optimize for thousands of different scenarios, starting with the most common. That is the approach that the HANA team took and it is now the more obscure scenarios that are not optimized.
Thanks John for testing for every revision.
Sybase IQ also supports multi-version concurrency control (MVCC) and for Sybase IQ 16 RLV (Row Level Versioning) is ready for Simplex.
As we don't understand all the technical details (data volume, table strcuture,I/O Subsystem etc), for a Disk-based RDBMS Sybase IQ could be tunned to improve performance through many ways multiplex, dbspace,index,quer plan etc.
Very good article, waiting for more comparing SAP HANA and other RDBMS(different vendor)
Very nice blog John
Thanks for sharing the performance test
Last query is taking more than 100 times time in HANA, its very surprising
Will be looking forward to your tests with SPS7
Thanks for taking the time to do the benchmark.
Just a few things on the IQ side as this is where my background is. As a long time IQ groupie I was very happy to see that IQ out performed HANA on 1/2 the queries 😎 . As you mentioned I am sure the IQ team will be cross pollinating to the HANA team so that HANA will become faster with each release which is exciting.
I was also surprised that IQ was 10 times faster on the LOAD. You are correct in saying that IQ does not perform well with a high volume insert rate/OLTP environment. In my experience you can do BULK inserts and only 1-3 TPS where as you expect between 50-200k TPS on the BULK Load - obviously dependant on the data and width of the table.
To me the combination of HANA/IQ really is the answer. If you look at the Retail example then IQ cannot handle that high volume POS data streaming into it in real-time. I had a customer like that and we had to poll data up ever 15 minutes and the bulk LOAD into IQ.... but latency proved to be an issue. If you could use HANA for this OLTP scenario and also store the current day/month in HANA and then age the older data into IQ but still make is accessible with Smart Data Access then it is a win/win in my opinion.
The one key factor that always comes into play HANA / IQ question is the cost. In my experience to date and based on pure TCO HANA struggles to win that one. I know it is not a fair comparison but I have found that the people with the cheque books want to see it. If you have a combination of the two then the pricing becomes more compelling as well as the scope of what you can achieve.
I know you are super busy but what would be nice would be to see how the "traditional" RDBMS's commonly used in Analytics solutions go with the same benchmark as I feel that this will really highlight just how good both those technologies are 😀
Thanks Clint - was hoping you would weigh in.
The point of this comparison was to show that even a very high-performing columnar database like IQ, even with all the data pinned in memory, can't get close to the aggregation and grouping performance of HANA.
To get the same performance with IQ, you would need 10x the hardware - which makes HANA very cost-effective if you need the performance. In fact I was impressed by IQ, it was much faster than any other database I have used.
The queries were constructed to prove my points - I'm sure for a mixed workload, it would go much more in HANA's direction 🙂 That said, if you are doing more complex things then IQ's mature engines are impressive.
Yes, you may be right on the data temperature perspective with HANA and IQ. However I would go further - you need all your frequently-used data in HANA. For a financials app like ERP, this would probably be this year and last year. Then we can hold operational data back to the beginning of time in IQ.
I have done a few business cases for BW on HANA with IQ as a cold store - was talking to Karin Tillotson about this a few days ago. The trouble is the complexity of architecture and additional cost this brings means you really need massive data volumes to make this practical from a TCO perspective. For that handful of customers 20-100TB of ERP/BW, this may currently make sense. Otherwise it's cheaper just to put as much as you can in-memory and to use the non-active data concept.
What we really need is a HANA appliance designed for multi-temperate data. The IQ engine becomes part of the HANA master node and data temperature is controlled automatically. Then we just need additional storage on the HANA appliance for IQ. This would fix the TCO problem that I've seen.
I was thinking to do a comparison with Sybase ASE - this would give people a good comparison of what HANA is like against a regular RDBMS. But I can tell you what I will find.
- HANA will perform around 40x faster for aggregation, in throughput, once we have indexes in place. Because most RDBMSs don't parallelize well, this will translate to 100-400x in single-user response times.
- For loading, we will find it is 5x slower than HANA
- We will then go and create a bunch of aggregates to improve performance. These will get the query response time to the same as HANA.
- Then our load performance will tank to about 100x slower than HANA, to deal with index updates and aggregate roll-ups.
Hay John - thanks for the detailed reply ! ... I know you and I can go on for days on this so will keep it short 😆
What I have also seen in this use case is customers just opting for the NLS/cold store in IQ and then keeping there hot data in their existing RDBMS. Ethan Jewett and I recorded a podcast on this and IQ significantly out performed SQL Server in this case. We had SQL Server coming off InfoCubes on the same server as BW and IQ off the DSO's on another Amazon server and the performance increase in IQ was significant. Link here. I was surprised about the TCO component of the NLS storage as we had a second follow up call with SAP around pricing and I was under the impression that there was a "run time" version of IQ just for NLS - but I do my best to stay away from pricing and sales 😉 . Link here. Another positive for customers putting in the NLS now is that they can ring fence their hot data with the NLS in place which has proved to make the business case for HANA investment later down the road more palatable.
Amen - I have been on about this for a while and if SAP can get this right the competitors will struggle to keep up !
Yip ... and not to forget the data explosion that typically happens by just storing the data. Throw in the indexes and you are suddenly managing and environment 2-3 times larger than your raw data
Thanks again for responses and look forward to picking this up in person and the next event
Just add disk based functionality into HANA for data temperature. And you get what you want. Solid architecture on one hardware.
It's logical, stable, scalable. And of course it's cheaper and have a lowest TCO.
It's will be in HANA 2.0 "Data Harmony" (In-Memory and Disk-Based symbiosis). 🙂
Great stuff John.
Now that you have IQ up and running have you had time to do any performance testing of queries that executed in HANA and IQ using HANA SDA? I think this is a really powerful combination and starts to show this multi-temperate physical layer that everyone wants and needs. Roll back the clock to 2008 had companies had detailed level data from the prior recession they could have done a much better job at predicting the business impact. I think that SAP customers are just starting to understand how IQ can be leveraged in a HANA environment.
Hay John B,
I've done some benchmarking on SDA queries between HANA and IQ, on a much smaller data set than this, and the performance has been excellent !
Scan to the 40th minute of this webinar that I did for SAP University Alliances to have a look at the setup and the benchmark queries
And yes - I also agree that SDA is a great solution for customers to look at the HANA/IQ combination.
Can't argue: I've also done some pretty complex real-world scenarios with SDA and I've found it to be awesome tech. And with IQ you will get pretty "warm" cold-store performance.
But, you have the cost and complexity of operating two technology stacks. Right now that's a deal-breaker for many scenarios 🙁
The good news is that SAP knows and is already looking into that.
Chances are we'll see a "HANA disk-based store" native component or something of that sort on the next releases.
Thanks for this.
Hope you executed every query in IQ at least 3 times?
Because IQ must cached query from disk to memory firstly.
And the same in HANA. Executed every query at least 3 times and get the average execution time.
Excellent blog John.
Nice analysis. But I don't completely agree with the results of the "Sybase IQ" column. Unless I configure the database server. 1) Is IQ optimized to run aggregate queries? 2) Is it properly indexed based on the uniqueness in column data? 3) Is the table on which the query is being executed partitioned? 4) Are the appropriate load options used for bulk loading? 5) Did you check for parallelism in IQ and HANA? ** Sorry, for my delayed presence here; also excuse me if my queries are already answered in the above discussion.
Thank you for the research and sharing, very nice and informative blog.
Hi John and Clint,
Thanks for this great post as this is what I am looking for right now.
Keep up the great work!
I would say this is pretty simple queries and the power of IQ is in handling complex queries. HANA is really fast at doing aggregates and the HANA Views are awesome that we do not have in SAP IQ. Both HANA and IQ can be leveraged together so we can use HANA SDA to access data residing in IQ and IQ can also access data residing in HANA as well as HANA views.
IQ is really good for a lift and shift type scenario so you can lift the schema from Oracle, SQL Server, DB2, Teradata, SAP ASE, Vertica, Greenplum, MySQL and with mimimal effort get the reporting running in SAP IQ virtually zero schema changes. Changing the ETL is always a challenge. IQ is really good at doing ETL processing inside the database.
HANA has alot of really nice features to augment IQ so between the two we have the best analytical environment available. IQ and HANA are true columnar implementations and many of the other columnar implementations are not what I would call true columnar. Columnar compression plays a key role in the effeciency of storage and processing. HANA provides true in-memory storage and IQ columns are stored on disk and IQ leverages large memory cache. IQ has statistics built into the column store and so the optimizer does really well on complex queries and complex designs. HANA is really good for provisioning data for multiple analytical use cases. I like them both.
IQ rocks for using with BW as Near-line Storage for the POC's that I have worked on. I think between the two products we cover a wide set of analytical, EDW/DW and SAP use cases.
You almost need to see columnar work to really believe it. One of our SAP partners was doing an IQ demo for a Fortune 50 customer using the customers data and the the customer had them type in a query and when they hit return the customer jerked back in their chair when the results came back so fast. Strip away all that complexity store data at detailed level.
Maybe you can suggest some complex queries that we can test? In my testing, HANA out-aggregates IQ by one order of magnitude, in real world EDW scenarios, on the same hardware.
Yes for BWoH, my current project is using IQ as a NLS and we are archiving terabytes of data right now. We haven't tested it but I expect IQ will be faster than the current DB by an order of magnitude. It doesn't solve the problem of quickly changing data but that's not where IQ's strengths lie.
Could you please make updates results for HANA SP 8 and IQ SP 8?
I've done this for HANA SPS08 but I don't have an IQ box right now. Hope it's interesting for you!
I should add - in HANA SPS07/8, it isn't necessary any more to add hints. The SQL optimizer got a lot smarter.
Along with SAP HANA, SAP IQ can perform as a archive for datas, or in other terms, a SAP IQ can act as a DB for cooler and large data while HANA deals with hot data,taking advantage of its fast processing capabilites.
Yes, IQ is a really nice data warehouse. For slow-moving data it's really ideal, the TCO is lower than HANA for scenarios where performance isn't the primary concern.
With HANA/IQ SPS09 we see dynamic tiering, where HANA and IQ are mated together for hot/warm data. In future revisions I'm certain we will see an integrated database stack with hot/warm/cold data, so we get excellent TCO and excellent performance.
It's been a number of years, are you open to running a similar test against HANA 2 and Sybase IQ with the memscale option to see how things have come along over the last few years?
I imagine it would make an interesting and informative experiment.