Skip to Content

Since its introduction in 2011, SAP tries to push HANA very heavily and there is a lot of marketing buzz over this new product. For a freelance consultant focused on SAP Sybase database products, like me, it is next to impossible to ignore HANA in year 2013. So, I decided not to rely to marketing slogans and check what HANA is, what it can do, and, importantly, what HANA is NOT. I put my first impressions to this blog post; hopefully other HANA-related posts will follow. Note that I’m not a HANA expert (yet 🙂 ) and I’m writing these rows as a person with a lot of experience with IQ and some other RDBMSs and trying to learn HANA.

So, why to compare HANA and IQ? Both are designed for data warehouse environment, both are column-based (with some support of row-based data), both provide a data compression out-of-the-box and highly-parallel. Years ago, much like SAP for HANA today, Sybase claimed that IQ processed data so fast that aggregation tables are not really needed, because the aggregations can be just performed on-the-fly. Well, experience with a number of big projects showed me how problematic that statement was, and it is only a single example.

According to SAP, the strong point of HANA is its ability to utilize CPU cache, which is much faster than accessing the main memory (0.5 – 15 ns. vs. 100 ns.). Currently, IQ and other Sybase RDBMSs lack this capability. Therefore, I decided to build a test environment which allows performing of queries that answer a number of conditions:

  • The query should be performed fully in memory, otherwise it is not fare to compare IQ and HANA. In HANA, queries are executed fully in memory, once relevant columns are loaded into the RAM.
  • The query should perform a lot of logical I/Os and should be hardly optimizable using indexes. Otherwise, the effect of using CPU cache may be not clear enough.
  • The query should take at least a number of seconds to finish. Since both IQ and HANA (very unfortunately) don’t provide the number of logical I/Os performed by a query, we may compare response times only. If the query finishes in small milliseconds, the comparison of response times may be problematic.

Some notes about the test environment:

For IQ, I used 16-core RHEL server with hyper-threading turned on (32 cores visible to OS) and 140GB RAM available. I used IQ 16.0 SP01 for my tests.

For HANA, I had to use HANA SPS6 Developer Edition on a Cloudshare VM, which provides HANA on a Linux server with 24GB RAM. However, only 19.5 GB is actually available from the Linux point of view (free –m output) and most of this memory is allocated by various HANA processes. In fact, less than 3GB RAM is available for user data in HANA. I only wish that SAP would allow us to download HANA and install it on any server that answers to HANA’s requirements for CPUs, but it seems that the SAP’s policy is to distribute HANA as a part of appliances only, so I don’t expect free HANA download any time soon.

This brings us to an additional requirement for the test: the test dataset should be relatively small, because of severe RAM restrictions imposed by HANA Developer Edition on Cloudshare.

Finally, I decided to base my tests on a relatively narrow table that represents information about phone calls (for those involved in Telecom industry, it is like short and very much simplified CDRs). Here is the structure of the table:

create table CDRs (
CDR_ID unsigned bigint,  — Phone conversation ID
CC_ORIG varchar(3),      — Country code of the call originatior
AC_ORIG varchar(2),      — Area code of the call originatior
NUM_ORIG varchar(15),    — Phone number of the call originatior
CC_DEST varchar(3),      — Country code of the call destination
AC_DEST varchar(2),      — Area code of the call destination
NUM_DEST varchar(15),    — Phone number of the call destination
STARTTIME datetime,      — Start time of the conversation
ENDTIME datetime,        — End time of the conversation
DURATION unsigned int    — Duration of the conversation in seconds
);

I developed a stored procedure that fills this table in SAP Sybase ASE row-by-row according to some meaningful logic and prepared delimited files for IQ and HANA. The input files are available upon request. At first, I planned to run tests on a dataset with 900 million rows, but I finally discovered that I have to go down to 15 million rows because of the VM memory limitations mentioned above.

Important note about the terminology. In IQ, inserting of the data from a delimited file into a database table is called LOAD, and retrieving of the data from a table to a delimited file is called EXTRACT. In HANA, the inserting is called IMPORT and the retrieving is called EXPORT. The term LOAD in HANA has a totally different meaning – it means loading of a whole table, or some of its columns, to the memory from disk, when the data is already in the database.

IMPORT functionality in HANA is not similar to IQ, at all. Actually, it contains two phases: IMPORT and MERGE. During the first phase, the data is imported to a “delta store” in an uncompressed form. Then, the data from the “delta store” is merged into “main store”, where the table data is actually resided. The merge is performed automatically, when a configurable threshold is crossed (for example, the size of the “delta store” becomes too big). To ensure that the imported data is fully inside the “main store”, a manual MERGE may be required. The memory requirements during the MERGE process are quite interesting, maybe I will write about it in a different post. It is pretty much possible that you will be able to IMPORT the data, but will not have enough memory to MERGE it; it happened to me a number of times during my tests. I would recommend you to read more about HANA architecture here: http://www.saphana.com/docs/DOC-1073, Chapter 9.

Given the significant difference between the test systems (a powerful dedicated server for IQ vs. small VM for HANA), I didn’t plan to compare the data load performance between IQ and HANA. However, so far I see HANA performing the IMPORT using not more than 1.5 core of 4 available, thus underutilizing the available hardware. The MERGE phase, though, is executed in a much more parallel way. The bottom line is that IQ seems outperform HANA in data loading, possibly quite by far. I will probably return to this topic in one of following posts, additional tests with larger dataset are required.

Now, we come to the data compression. Since IQ and HANA approach the indexing quite differently, I chose to compare the compression without non-default indexes in both IQ and HANA. It appears that IQ provides better data compression and needs 591M to store 15,000,000 rows, while HANA needs 748M to store the same data. HANA provides a number of compression algorithms for columns, which are chosen automatically, according to the data type and data distribution. However, it seems that neither of compression algorithms offered by HANA contains LZW-like compression used by IQ. I’d prefer to test the compression on a more representative data set (15,000,000 is way too small) and play with different HANA compression algorithms. I hope one of future posts will be dedicated to this topic.

Finally, the data is inside the database and we are ready to query it. To answer the test conditions mentioned above, I chose the following query:

select   
a.CDR_ID CDR_ID_1, b.CDR_ID CDR_ID_2,
a.NUM_ORIG NUM_A, a.NUM_DEST NUM_B, a.STARTTIME STARTTIME_1, a.ENDTIME ENDTIME_1,
a.DURATION DURATION_1,
b.NUM_DEST NUM_C, b.STARTTIME STARTTIME_2, b.ENDTIME ENDTIME_2,
b.DURATION DURATION_2
from CDRs a, CDRs b
where a.NUM_DEST = b.NUM_ORIG
and datediff(ss, a.ENDTIME, b.STARTTIME) between 5 and 60
order by a.STARTTIME;

This query finds cases when a person A called person B and then the person B called person C almost immediately (in 60 seconds). This query has to perform a lot of logical I/O by its very definition. With my test data set, this query returns 31 rows.

In IQ, this query takes 6.6 seconds while executed fully in memory and when all relevant indexes are in place. The query uses sort-merge join and runs with relatively high degree of parallelism, allocating about 60% of 32 CPU cores available.

In HANA, the same query takes only 1 second with no indexes in place! Remember, that in my tests HANA is running on a small VM with just 4 virtual CPU cores! The query finishes so fast that I cannot measure the degree of parallelism. Creation of indexes on NUM_ORIG and NUM_DEST reduces the response time to 900 ms.

A note about indexes in HANA: HANA offers only two index types and, by default, it chooses the index type automatically. In my tests, I have found that indexes improve query performance in HANA, sometimes significantly. Unfortunately, I have not found any indication of index usage in HANA query plans, even when some indexes were used by the query for sure. The role of the optimizer statistics in the query plan generation is also not very clear to me. I hope to prepare a separate post about query processing in HANA, stay tuned!

Another amazing and totally unexpected finding in HANA – index creation on NUM_DEST (varchar(15)) takes 194 ms. Index on DURATION (int) is created in 12ms!

My conclusions so far:

  1. HANA in-memory processing is not just about caching, it is much more than that. HANA allows us to achieve incredible performance for resource-intensive queries. Things that seem impossible with other databases, column-based or row-based, may become possible with HANA.
  2. Loading of the data into HANA requires careful resource and capacity planning. Merging of the inserted data with the rest of the table may require much more memory that you have probably thought. Particularly, to perform the merge, both old and new version of the table should fit into memory.
  3. It is pretty much possible that storing of aggregations in HANA is not required indeed, at least in most of the cases. Of course, I need a more representive result set to verify it.
  4. IQ and HANA can be used together in the same system, where they can solve different problems and store different data. HANA is very good for real-time data processing, or for queries that must be executed very quickly. However, it is not feasible to store the whole multi-terabyte data warehouse in HANA’s memory in most of the cases, at least not in year 2013. At this point, IQ enters into the game. It is very efficient in massive data loading and data storage, and can answer queries with less strict response time requirements very efficiently. In some scenarios, the raw data can be loaded into IQ, and then, after some refining inside IQ, imported into HANA

This post has been originally published in my personal blog at http://leonid-gvirtz.typepad.com/blog.

To report this post you need to login first.

9 Comments

You must be Logged on to comment or reply to a post.

    1. Leonid Gvirtz Post author

      Hi Kurt

      Since I’m not sure hot to attach a file here, I attached the IQ query plan to my original blog post at the bottom: http://leonid-gvirtz.typepad.com/blog/2013/09/comparison-between-sap-hana-with-sap-sybase-iq-first-impressions.html. Please take it from there. Unfortunately, I can’t find a corresponding plan from HANA, but it was not very informative anyway. Note that index usage and join types are not reflected in HANA, at least in the current version.

      My 30 days of HANA Developer Edition are over 😡 , so I don’t have the access to HANA, at least for now, and cannot reproduce the test scenario again. Nevertheless, you can reproduce my test case if you like, I can send you my input data if needed.

      Thanks

      Leonid Gvirtz

      (0) 
  1. Haani Hadouka

    Leonid has an exceptionally clear and engaging writing style.
    What is so impressive is his ability to explain complex technical scenarios in
    a way that inspires the reader and makes the subject seem as it’s a
    “walk in the park” when we know full well that in reality there is a
    a lot more to it. This ability can only be a result of extensive knowledge,
    long experience and of course, a particular speciality of connecting with people.

    Many thanks for your always useful and most interesting contributions
    Leonid !!

    (0) 
    1. Leonid Gvirtz Post author

      Haani, thank you very much for your feedback! Things like this give me strong motivation to continue.

      Note, that I performed some additional tests with HANA before my trial period of 30 days finished. I planed to write at least two additional blog posts about HANA and comparison between HANA and IQ, Unfortunately, didn’t have time to do it, partly because I need to concentrate on preparation to my lecture on SAP Teched on Oct. 23. I plan to use the opportunity of the participation in Teched and learn more about HANA (and IQ, of course 🙂 ).

      (0) 
  2. Jeff Yoder

    Excellent comparison but you left out a major equation regarding choice. ROI/Cost. I’m willing to bet a VLDB license that HANA cost is significantly higher than IQ, perhaps even negating any real gain over IQ?  From my water cooler BS chats, I’m told the cost of HANA is beyond any company that doesn’t make the FORTUNE 500 list? Perhaps I’m wrong but just saying! Great article though!

    (0) 
    1. John Appleby

      It’s a good question and it depends on your metrics for value, of course. I think it’s fair to say that from a cost-per-performance perspective, you will find that IQ is very competitive.

      It’s also fair to say that you’re unlikely to use HANA for scenario where your data has a low intrinsic value or where performance doesn’t provide any benefit.

      But I wouldn’t confuse that with being possible for only Fortune 500 companies. There are a ton of startups using HANA, Lumira Cloud, HANA Cloud Platform also run on HANA, and you can get into it for $1/hour using HANA One.

      (0) 
  3. John Appleby

    Interesting article. I see that you come it from an IQ perspective, because you started to create indexes on HANA tables 🙂

    In fact your are right – it is almost never necessary to create indexes on HANA tables. This is because each column is its own index in its own right, which has a sort order and a dictionary key.

    There are some rare Operational/Transactional scenarios where additional indexes may improve the performance of finding individual rows with complex joins.

    I’ve never found an Analytics scenario where performance was improved with secondary indexes.

    John

    (0) 
  4. Shaunak Ashtaputre

    Hi Leonid,

    I was wondering if you’ve done any further such benchmarks or tests with the latest HANA SP?

    Thanks for the informative article.

    Shaunak

    (0) 
    1. Leonid Gvirtz Post author

      Hi

      I understand that it is “a bit” late, but unfortunately I saw the last comment only now 🙁 . No, I have not tried something similar with recent HANA releases. I wrote my post two years ago and it is a lot of time in our industry, both for HANA and IQ. I hope I will have time to perform similar or even better benchmark in coming months, adding to it Amazon Redshift as well.

      Leonid

      (0) 

Leave a Reply